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Abstract  —  A  linear  code  can  be  thought  of  as  a 
vector  matroid  represented  by  the  columns  of  code’s 
generator  matrix;  a  well-known  result  in  this  context 
is  Greene’s  theorem  on  a  connection  of  the  weight 
polynomial  of  the  code  and  the  Tutte  polynomial  of 
the  matroid.  We  examine  this  connection  from  the 
coding-theoretic  viewpoint,  building  upon  the  rank 
polynomial  of  the  code.  This  enables  us 

•  to  relate  the  weight  polynomial  of  codes  and  the 
reliability  polynomial  of  linear  matroids  and  to 
prove  new  bounds  on  the  latter; 

•  to  prove  that  the  partition  polynomial  of  the 
Potts  model  equals  the  weight  polynomial  of  the 
cocycle  code  of  the  underlying  graph,  and 

•  to  give  a  simple  proof  of  Greene’s  theorem  and 
its  generalization. 

I.  INTRODUCTION 

Let  C  be  a  linear  code  of  length  n  and  let  E  =  {1, 2, . . . ,  n}  be 
its  coordinate  set.  The  weight  polynomial  of  C  is  defined  as 
A{x,y)  =  Aixn~'yx,  where  Ai  is  the  number  of  vectors 
of  Hamming  weight  i  in  C.  Let  G  be  a  generator  matrix  of  C. 
By  G(F)  we  denote  the  submatrix  of  G  formed  by  the  columns 
with  numbers  in  F  C  E.  The  rank  polynomial  of  C  is  defined 
as  U{x,  y)  =  T,t=o  Wxuyv,  where 

U l  =  |{F  C  E  |  \F\  =  u,  rk(G(F))  =  v}\ 

The  polynomials  A(x,y)  and  U(x,y)  are  connected  by  the 
following  relation,  equivalent  to  Greene’s  theorem  [3]. 

Theorem  1: 

A{x,y)  =  yn\C\u{^-,-^  (1) 

The  code  C  can  be  also  thought  of  as  a  (vector)  matroid  M 
represented  by  the  column  space  of  G;  so  given  M,  we  call  G 
the  code  of  M,  denoted  C(M). 

II.  Reliability  polynomial 

Let  M  be  a  linear  matroid  of  rank  k  on  the  ground  set  E 
of  size  n  defined  by  its  representation  over  F,  and  let  U\  be 
its  number  of  independent  sets  of  size  i.  The  ( all-terminal ) 
reliability  polynomial  of  M,  by  definition,  is 

k 

lZ(M;x,y)  '^^li\xn~iyi .  (2) 

i=0 

The  terminology  is  motivated  by  the  special  case  of  cographic 
matroids.  Namely,  let  G{V,E)  be  a  connected  graph  and  let 


M  be  a  matroid  whose  independent  sets  are  given  by  subsets  of 
edges  whose  removal  does  not  make  G  disconnected.  Suppose 
that  each  edge  in  E  is  removed  with  probability  p.  Then  the 
probability  that  upon  completion  of  this  process  the  graph 
remains  connected  is  given  by  72(M;p,  1  —  p).  Reliability  of 
graphs  and  matroids  has  been  a  subject  of  continued  interest 
in  combinatorics  [2].  The  main  result  of  this  section  is: 

Theorem  2:  Let  A(x,y)  be  the  weight  polynomial  of  the  lin¬ 
ear  code  C(M).  Then 

n(M-P>  1  -  P)  <  uj‘pn-k(l-p)k  +  A(l,p)  -  1.  (3) 

In  this  way  the  reliability  polynomial  can  be  related  to  the 
probability  of  undetected  error  for  linear  codes;  the  upper 
bounds  on  the  latter  are  used  in  the  paper  to  derive  new  upper 
bounds  on  72(M;p,  1  —  p). 

III.  Partition  function 

Let  r  —  (V,E)  be  a  finite  graph  with  |£|  =  n  edges  and 
c(r)  connected  components.  Consider  the  Potts  model  of  in¬ 
teraction  for  a  physical  system  represented  by  T  [4].  Under 
this  model  each  vertex  in  V  can  be  in  one  of  q  possible  states; 
an  allocation  of  states  to  all  the  vertices  defines  a  state  a  of 
the  system  or  a  coloring  of  V  with  q  colors.  The  partition 
function  of  the  Potts  model  is  defined  as  follows: 

er 

where  the  sum  is  over  all  possible  states  o  of  the  system  and 
U(a)  is  the  subset  of  edges  with  both  ends  of  the  same  color. 

Theorem  3:  Let  A{x,y)  be  the  weight  polynomial  of  the  q-ary 
cocycle  code  of  T.  Then 

A(l,y)  =  q~c(V)ynZ{y). 

Further  details  are  found  in  [1] 
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Abstract  —  Denote  by  R  the  Galois  ring  of  charac¬ 
teristic  pe  and  cardinality  p'm,  where  p  is  a  prime  and 
e  and  m  are  positive  integers.  Let  g(x)  be  a  monic 
polynomial  over  Fpm .  A  polynomial  f(x)  over  R  is  de¬ 
fined  to  be  a  Hensel  lift  of  g{ x)  in  fi[x]  if  f(x)  =  g(x), 
where  —  is  the  natural  homomorphism  from  R  onto 
Fpm,  and  there  is  a  positive  integer  n  not  divisible  by 
p  such  that  /(x)  divides  xn  —  1  in  R[x].  It  is  proved  that 
g(x )  has  a  unique  Hensel  lift  in  R[x]  if  and  only  if  g(x ) 
has  no  multiple  roots  and  x  J{  g(x).  An  algorithm  to 
compute  the  Hensel  lift  is  also  given. 

I.  Definition 

In  1995  the  following  definition  of  the  Hensel  lift  of  a  polyno¬ 
mial  appeared  in  [1], 

Let  /i2  €  F2  [x]  be  of  degree  m  >  0  and  assume  that  h.2\(xl  - 
1)  and  l  is  minimal  subject  to  this  property.  There  is  a  unique 
monic  polynomial  h  €  Z^x]  of  degree  m  such  that  h  —  h,2  and 
h\(xl  —  1)  in  Z4I  x].  This  polynomial  is  called  the  Hensel  lift 
of  /12  (x) . 

In  the  above  definition  the  condition  that  l  is  odd  should 
be  added.  A  counter-example  when  l  is  even  is:  fi2(x)  = 
(x-l)2(x2+x+l),  hi  |  (x6  —  1)  inF2[x],  h  —  (x2  —  l)(x2+x-f  1) 
and  h!  =  (x2  —  l)(x2  —  x  +  1). 

The  formulation  of  the  above  definition  involves  some  state¬ 
ments  which  should  be  proved.  Now  we  suggest  a  simpler  def¬ 
inition  which  can  be  formulated  for  an  arbitrary  Galois  ring. 
For  Galois  rings,  see  [2]  and  [3]. 

Let  g(x)  be  a  monic  polynomial  over  Fpm  .  A  monic  polyno¬ 
mial  /(x)  over  R  is  called  a  Hensel  lift  of  g(x)  if  /(x)  =  g(x) 
and  there  is  a  positive  integer  n  not  divisible  by  p  such  that 
/(x)|(xn  —  1)  in  /t[x]. 

II.  Existence  and  Uniqueness 

Proposition  1.  A  monic  polynomial  g(x)  over  Fpm  has  a 
Hensel  lift  /(x)  over  R  if  and  only  if  g(x)  has  no  multiple 
roots  and  x  /  g(x)  in  Fpm  [x]. 

Lemma  2.  Let  n\  and  712  be  positive  integers  and  n  = 
gcd(ni,n2).  Then  xn  —  1  =  gcd(xni  —  l,xn2  —  1)  in  Fpm  [x], 
(xn  —  l)|(xni  -  1)  in  A[x],  and  (xn  —  l)|(x"2  —  1)  in  /t[x]. 

Proposition  3.  Let  g(x)  be  a  monic  polynomial  over  Fpm 
without  multiple  roots  and  x  /  g(x)  in  Fpm  [x] .  Then  g(x)  has 
a  unique  Hensel  lift  in  J?[x] . 

III.  An  Algorithm  to  Compute  the  Hensel  Lift 

Based  on  Propositions  1  and  3  of  the  proceeding  section  we 
formulate  the  following  algorithm  for  computing  the  Hensel 
lift  of  a  monic  polynomial  over  Fp™  in  it[x]. 


Algorithm  Given  a  monic  polynomial  g{x)  of  degree 
>  0  over  FP"»  to  compute  the  Hensel  lift  of  g(x)  in  R[x]  we 
proceed  in  the  following  steps. 

1.  Test  whether  x|p(x)  in  Fpm  [x]. 

If  yes,  we  are  finished  and  g(x)  has  no  Hensel  lift  in 
R[x]. 

If  no,  go  to  step  2. 

2.  Compute  gcd (g(x),g'(x))  and  let  it  be  d(x). 

If  degd(x)  >  0,  we  are  finished  and  g(x)  has  no  Hensel 
lift  in  A[x]. 

If  deg  d(x)  =  0,  go  to  step  3. 

3.  Factorize  g(x)  into  a  product  of  distinct  monic  irre¬ 
ducible  polynomials  over  Fpm  by  Berlekamp’s  Algo¬ 
rithm.  Let  the  result  be 

g(x)  =  9i(x)g2(x)  ...gr(x), 

where  gi(x),g2(x), . . .  ,gr{x)  are  distinct  monic  irre¬ 
ducible  polynomial  over  Fpm .  Let  deg  Qi(x)  =  n,,  i  = 
1,2, ...  ,r  and  go  to  step  4. 

4.  Compute  lcm[pmni  —  l,pmn2  —  1, . . .  ,pmnr  —  1].  Let  the 
result  be  n,  then  p  does  not  divide  n  and  p(x)|(xn  —  1). 
Go  to  step  5. 

5.  Divide  xn  —  1  by  g(x)  by  division  algorithm.  Let  the 
quotient  be  pi(x).  Then  xn  -  1  =  g(x)g i(x)  and 
gcd(p(x),pi(x))  =  1.  Go  to  step  6. 

6.  By  the  constructive  proof  of  Hensel’s  Lemma  construct 

two  coprime  monic  polynomials  /(x),  /i(x)  E  iZ[x] 
such  that  xn  —  1  =  /(x)/i(x)  in  A[x]  and  /(x)  = 
g{x),f j(x)  =  gi(x).  Then  /(x)  is  the  Hensel  lift  of 
p(x)  in  A[x],  □ 

When  Fpm  =  F2  and  R  =  Z4,  the  Hensel  lift  of  a  polyno¬ 
mial  g(x)  over  F2  without  multiple  roots  and  not  divisible  by 
x  can  be  calculated  by  using  Graeffe’s  method  for  finding  a 
polynomial  whose  roots  are  the  squares  of  the  roots  of  g(x), 
see  [4]  and  [5], 
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Abstract  —  Binary  superimposed  codes  were  intro¬ 
duced  by  W.H.Kauts  and  R.C. Singleton  in  1964  [1], 
In  [2]  a  concept  of  superimposed  code  distance  was 
suggested.  In  1996  a  new  construction  based  on  the 
incidence  of  the  finite  sets  was  suggested  [3].  It  was 
studied  and  generalized  in  [4,  5].  We  consider  the 
further  extension  of  this  construction,  which  allows 
to  create  new  superimposed  codes  from  the  existing 
ones.  We  also  find  the  superimposed  distance  for  this 
construction.  Part  of  this  work  was  presented  in  [6]. 

I.  Notations  and  Definitions 

Definition  1.  An  incidence  system  is  a  triplet  X  = 
(A,  B,  -<),  where  A  and  B  are  finite  sets  and  -<  is  an  incidence 
relation  between  them,  i.e.  for  any  a  E  A  and  b  £  B  either 
a  -<  b,  or  a  /  b.  Put  N  =  N(X)  =  \A\  and  t  =  t(X)  =  |B|. 
An  incidence  matrix  of  X  is  binary  N  x  t  matrix  X  (X) ,  which 
rows  and  columns  are  indexed  by  elements  a  £  A  and  b  €  B, 
respectively,  and  an  element  xa(b)  =  1  iff  a  -<  b. 

For  an  incidence  system  X  and  an  integer  s  >  0  put 

Vs{ X)  =  {(r,  b)  :  t  C  B,  |r|  <  s,  b  E  B\r}  . 

Definition  2.  A  pair  (r,  b)  £  VS{X)  is  called  disjunctive  if 
the  disjunctive  set  of  this  pair  D(r,  b)  ^  0,  where 

D(r,  b)  =  {a  E  A  :  a  b,  a  /  b'  for  b'  €  r}  . 

Definition  3.  For  a  system  X  and  an  integer  s  >  0  the 
value 

24(1)  A  min  |D(r, b)| 

(T,b)67MZ) 

is  called  the  superimposed  s- distance  of  X. 

Definition  4.  If  a  superimposed  s-distance  24  (I)  >  0 
(i.e.  all  pairs  (r, b)  £  74(1)  are  disjunctive)  then  X  is  called 
an  s-disjoint  system.  In  this  case  the  incidence  matrix  X(X) 
is  called  a  superimposed  code  of  strength  s,  size  t(X)  and  length 
N(X)  [1,  2],  The  value  24(1)  is  called  the  superimposed  dis¬ 
tance  of  this  code  [2], 

II.  Description  of  the  Construction 

Let  n>m>h>l  be  integers  and  Ik  —  ( Ak,Bk ,  -<k)  be 
arbitrary  incidence  systems,  1  <  k  <  n.  In  this  section  we 
define  a  new  incidence  system  X  —  X(n,m,  h,  h, . . . ,  In). 

Consider  a  new  zero  symbol  “0”.  For  each  k  —  1  ,...,n 
define  the  new  incidence  system  1°  =  (A°,B°,-<°),  where 
.4°  =  Ak  U  {0},  B°  —  Bk  U  {0},  and  the  relation  -<°  is  defined 
as  follows:  1)  0  X®  b  for  any  i)  £  B®;  2)  a  /J  0  for  any  a  £  A&; 
3)  at  the  sets  A*,  and  Bk  relation  -<£  is  the  same  as  -<*,. 

1The  work  of  P.  Vilenkin  was  supported  by  the  Russian  Founda¬ 
tion  of  Basic  Research,  grant  98-01-00241. 
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Put  X (n,  m,  h,  h, . . . ,  In)  —  (A,  B,  -<),  where  the  sets 

A  =  {a  =  (or,. . .  ,an)  :  ak  €  A°k,  |a|  =  h }  , 

R  -  {b  =  (&i,..., 6n)  :  bk  €  B°,  |b|=m}, 

where  |a|  and  |b|  denote  the  number  of  non-zero  components 
in  vectors  a  and  b,  respectively,  and  the  incidence  relation  V 
between  A  and  B  is  defined  component-wise,  i.e.  a  b  if  and 
only  if  ak  -<°  bk  for  all  k  =  1, . . . ,  n. 

This  construction  generalizes  those  which  were  considered 
before  [3,  4,  5]. 

III.  Properties  of  I  =  l(n,m,h,Ii,...,In) 

Theorem  1.  Assume  that  1  <  s  <  h  and  the  system  Ik  is 
s-disjunct  for  all  k  E  {1, . . . ,  n).  Then  X  is  also  s-disjunct. 

Theorem  2.  Assume  that  s  >  1  and  the  system  X  is  s- 
disjunct.  Then  h  is  also  s-disjunct  for  each  k  E  {1, . . .  ,n}. 

For  positive  integers  s  and  h  denote  by  14 (s)  the  set  of  vec¬ 
tors  v  =  (ui, . . . ,  Vh),  which  components  Vk  are  non-negative 
integers,  and  the  sum  vi  +  •  ■  ■  +  Vh  =  s.  For  each  vector  v 
denote  by  |v|  the  number  of  positive  components  Vk- 

Theorem  3.  Let  n  >  m  >  h  >  1  be  integers  and  I  be  an 
arbitrary  incidence  system.  For  any  s  >  1  the  superimposed 
s-distance  of  the  incidence  system  X  —  X{n,  m,h,I,  ■ . . ,  I)  has 
the  form 

where  the  minimum  is  taken  over  all  vectors  v  E  14  (s),  for 
which  | v j  <  s. 

In  general  case,  when  the  systems  Ik  are  not  the  same,  the 
formula  for  V3  (X)  can  be  found  in  [6] . 
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I.  Introduction 

Let  Do, . . . ,  Dn  be  the  {0,  l}-rnat.rices  forming  an  associ¬ 
ation  scheme.  Since  M  's  tbe  a'l-one  matrix,  a  lin¬ 

ear  combination  y]”_  CkDk  can  be  regarded  as  the  matrix 
{f(x,y))x,y  representing  the  function  /  defined  by  f(x,y)  =  c k 
if  ( x,y )  is  in  relation  corresponding  to  matrix  Du-  Parame¬ 
ters  of  functions  thus  obtained  may  now  be  studied  exploiting 
properties  of  the  association  scheme. 

One  such  parameter  is  the  communication  complexity 
C(f),  which  is  the  number  of  bits  that  two  persons  have  to  ex¬ 
change  in  order  to  evaluate  f(x,y),  when  initially  one  person 
only  knows  x  and  the  other  person  only  knows  y.  Commu¬ 
nication  complexity  turned  out  to  be  an  important  topic  in 
computer  science,  cf.  [4].  Connections  between  communica¬ 
tion  complexity  and  information  theory  are  discussed  in  [2] 
and  [3].  The  function  under  consideration  is  the  function 

...  I  1  if  x,y  are  in  relation  k.k  odd 
J  v  ' J  10  if  x,y  are  in  relation  k,k  even 

The  communication  complexity  can  be  excatly  determined 
if  for  z  =  0,  1  all  eigenvalues  of  the  matrices 


AM/)  =  J]  Dk 

k  =  z  mod  2 

are  different  from  0.  Already  in  [5]  we  derived  the  following 
identity  for  the  Krawtchouk  polynomials  Kk{i,q,n). 
Theorem  1  [5]:  For  z  =  0,  1  it  is 


£ 


Kk{i,q,n) 


fc=;(mod  2) 


J  \{qn  +  (-1)2(2  -  q)n)  i  =  0 
\  (  — 1)J2’_1(2  —  q)n_'+1  i>  1 


The  idea  of  proof  in  [6]  is  to  exploit  the  simultaneous  di- 
agonalizability  of  the  matrices  Do, . . . ,  Dn  of  the  association 
scheme  and  a  recurrence  formula  for  their  eigenvalues  due  to 
Delsarte  [1] 


F(i,  k,  n )  =  bk  F(i  —  1,  k,n  —  1)  —  bk  1  F(i  —  1,  k  —  1  ,n  —  1) 


The  Krawtchouk  polynomials  and  also  the  Eberlein  poly¬ 
nomials  Ek{i,n,l)  =  X^=0(-1)JQ)  (Mj)  {l~k-~}')  obe-v  this  ro" 
cursion  with  6=1 

If  the  function  /  is  defined  on  the  Johnson  scheme,  then 
the  eigenvalues  of  A/-(/)  for  z  =  0, 1  are  linear  combinations 
of  the  Eberlein  polynomials 

e.i{z,n,q)  =  Ek(i,n,l),  i  =  0....,n 

k= z  mod  2 

Theorem  2:  For  the  function  /  when  defined  on  the 
Johnson  scheme  the  matrices  z  =  0,1.  have  full 


rank  if  for  all  i  —  1  ,  ■••,n  the  Krawtchouk  polynomials 
Fn-i+i  (n  —  i  +  1,  l  +  2i  —  2,  2)  are  different  from  0. 

Proof:  First  observe  that  the  eigenvalues  co(z,n,l)  (and 
ii  >  1)  are  both  positive  for  z  —  0,  1  as  the  sum  of  positive 
terms  and  hence  different  from  0. 


e,(0.  n,  l)  —  e,_  i(0,  n  -  1,1  +  2)  -  e;—  i ( 1 ,  n  —  1,  /  +  2) 

n-!  +  l 

=  2i_1  (-i)kEk{0,n-i  +  l,l  +  2i-2) 

k  =  0 

=  (- i)n-i+12'-1  /\„_<+i(n  -  i  +  1,/  +  2 i  -  2,2) 

So  the  problem  here  is  to  determine,  when  a  Krawtchouk 
polynomial  Kk(k,  m,  2)  (the  degree  and  the  first  variable  being 
the  same)  can  be  0.  This  is  possible  for  in  even  and  k  =  y .  We 
didn’t  find  any  other  parameter  pair  (k,  m)  with  this  property. 

A  third  family  of  orthogonal  polynomials  obeying  the  above 
recursion  arc  the  6-analogues  of  the  Krawtchouk  polynomials 


£(-l)^(i) 

2=0 


n  -  i 

k  ~  j 


k-j-l 


[I  (c6n  -  6'+*) 


where  ("')b  denotes  the  Gaussian  binomial  coefficient.  The 
eigenvalues  of  the  association  schemes  of  bilinear  forms  over 
GF(b )  have  as  parameters  a  prime  power  6  and  c.  =  br  for 
some  nonnegat  ive  integer  r.  The  eigenvalues  of  the  association 
schemes  of  alternating  bilinear  forms  have  as  parameters  6  = 
p2  the  square  of  a  prime  p  and  c.  —  p  or  c  =  A  (cf  [1]).  By 
calculation  modulo  2  it  can  be  derived 

Theorem  3:  Let  a  function  /  be  defined  as  above  on  the 
association  scheme  of  bilinear  forms  over  GF(b)  or  on  the 
association  scheme  of  alternating  bilinear  forms.  Further  let 
the  prime  p  defining  the  parameters  6  and  c  be  odd.  Then  the 
matrices  A/-(/),  z  =  0, 1,  have  full  rank  if  6  —  1  is  not  a  power 
of  2. 
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Abstract  —  The  design  of  practical  and  powerful 
codes  for  protection  against  erasures  can  be  reduced 
to  optimizing  solutions  of  a  highly  nonlinear  con¬ 
straint  satisfaction  problem.  In  this  paper  we  will 
attack  this  problem  using  the  Differential  Evolution 
approach  and  significantly  improve  results  previously 
obtained  using  classical  optimization  procedures. 

I.  Introduction 

Based  on  the  theoretical  results  proved  in  [1],  we  will  in  this 
paper  attack  a  nonlinear  constrained  satisfaction  problem  the 
solutions  of  which  correspond  to  highly  efficient  codes.  The 
optimization  problem  involved  will  be  attacked  by  Differential 
Evolution,  a  robust  optimizer  which  has  proved  quite  effective 
for  similar  types  of  problems. 

The  codes  from  [1]  are  built  from  sparse  bipartite  graphs 
and  generalize  a  classic  construction  of  Gallager  [3].  After 
collecting  the  information  contained  in  the  received  bits,  the 
algorithm  removes  the  corresponding  variable  nodes  from  the 
graph  together  with  all  edges  emanating  from  them.  Then, 
at  each  round,  it  looks  for  a  check-node  of  degree  one,  copies 
its  content  into  its  unique  neighbor,  updates  the  values,  and 
removes  the  variable  node  and  all  edges  emanating  from  it 
from  the  graph.  The  decoder  is  successful  if  the  final  graph 
is  empty.  It  was  shown  in  [1]  that  if  the  graph  is  sampled 
uniformly  at  random  from  the  ensemble  of  graphs  with  de¬ 
gree  distributions  (A ,  p)  (see  below  for  a  definition),  then  the 
algorithm  successfully  recovers  from  a  random  (5-fraction  of 
erasures  with  high  probability  iff  5A(1  —  p(l  —  a;))  <  x  for 
x  6  (0,5).  If  \{x)  —  JT  Aix'-1  and  p(x)  =  JA  pix'-1 ,  then 
we  say  that  the  graph  has  degree  distribution  (A ,p)  if  the  frac¬ 
tion  of  edges  connected  to  a  variable  (check)  node  of  degree  i 
is  A i  ( pi ).  The  task  at  hand  is  now  to  find  appropriate  polyno¬ 
mials  A  and  p  with  nonnegative  coefficients  that  give  rise  to  a 
code  of  a  given  rate  such  that  the  above  inequality  is  satisfied 
for  a  large  value  of  6. 

II.  Differential  evolution 

The  code  design  problem  as  described  above  is  a  nonlinear 
constraint  satisfaction  problem  with  continuous  space  param¬ 
eters,  a  problem  class  where  Differential  Evolution  (DE)  [2] 
has  proven  to  be  very  effective.  The  main  properties  of  DE  are 

(1)  Initialization  in  which,  similar  to  evolutionay  strategies,  a 
random  first  generation  of  vectors  is  created  which  changes 
over  time  according  to  (2)  mutation,  and  (3)  recombination, 
(4)  selection  of  the  survivors,  and  (5)  the  stopping  criterion. 
What  gives  DE  its  name  is  the  differential  nature  of  the  muta¬ 
tion  step,  in  which  at  each  round  random  pairwise  differences 
of  two  pairs  of  population  vectors  are  added  to  population 
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members.  The  recombination  scheme  follows  usual  evolution¬ 
ary  algorithms.  The  reader  is  invited  to  consult  [2]  for  more 
information  on  DE. 

III.  Code  design 

For  designing  the  code,  we  started  by  fixing  the  rate  of  the 
code  and  randomly  producing  degree  distributions  giving  rise 
to  codes  of  that  rate.  For  doing  this,  note  first  that  the  con¬ 
ditions  relating  the  coefficients  of  A(z)  and  p(x)  force  the  free 
coefficients  of  these  polynomials  to  lie  in  a  finite  polytope.  Our 
first  task  is  then  to  choose  random  elements  from  this  poly¬ 
tope.  To  achieve  this,  we  implemented  a  different  strategy, 
known  as  the  “Queen’s  move”  :we  started  with  some  point  in¬ 
side  the  polytope  constructed  deterministically,  and  repeated 
the  following  procedure  between  50  and  100  times:  we  ran¬ 
domly  selected  a  line  through  the  point,  and  randomly  se¬ 
lected  a  point  on  that  line  inside  the  polytope.  This  gave 
us  one  population  member.  For  the  next  members,  we  re¬ 
peated  the  whole  procedure  again,  until  all  the  (initial)  popu¬ 
lation  members  were  generated.  To  reduce  the  dimensionality 
of  the  problem,  we  did  not  let  the  node  degrees  on  the  left 
and  the  right  take  on  all  possible  node  degrees  in  a  given 
range.  Rather,  we  experimented  with  the  idea  to  force  to 
zero  those  A*  and  pk  which  have  small  values  and  to  not 
treat  them  as  free  parameters  subject  to  optimization.  Typ¬ 
ically,  we  chose  the  node  degrees  in  the  following  way:  on 
the  left  hand  side,  we  chose  the  degrees  2,  3,  a  highest  de¬ 
gree  (between  20  and  30)  and  one  degree  in  between.  On 
the  right  hand  side,  we  chose  two  consecutive  degrees,  either 
7  and  8,  or  8  and  9.  By  way  of  an  example,  we  mention 
of  the  rate  1/2  sequences  that  we  found  with  our  method: 
\{x)  =  0. 26328x  + 0.1 8020a:2  +  0.2  7  0  0  0x6  +  0.28  6  49x  29,  p(x)  = 
0.63407a:7 +0.36593X8.  The  highest  5  value  for  this  sequence  is 
0.4955.  It  can  be  shown  that,  given  the  highest  possible  value 
attainable  with  the  average  degrees  of  the  graphs  induced  by 
these  distributions  is  0.4985.  Hence,  this  sequence  is  within 
less  than  1%  of  the  optimum.  Other  very  good  sequences  will 
be  presented  in  the  talk. 
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Abstract  —  In  this  paper,  new  results  on  inser¬ 
tion  and/or  deletion  correcting  codes  are  presented. 
Firstly,  new  properties  relating  codewords  to  sub¬ 
words  are  investigated.  Secondly,  a  new  error  cor¬ 
recting  scheme  based  on  convolutional  coding,  is  pro¬ 
posed. 

I.  Codewords  and  computer  search 

An  alternative  way  of  representing  binary  words  is  used 
which  simplifies  the  process  of  determining  subwords  after  in¬ 
sertion/deletion  errors.  All  the  binary  words  are  characterized 
by  the  length  of  runs  present  in  the  word  as  well  as  the  starting 
bit,  e.g.  10000100  — >  1412/1.  In  the  case  of  deletion  errors,  all 
the  subwords  can  be  obtained  by  decreasing  the  size  of  each 
run  present  in  the  word.  If  the  first  run’s  size  is  1  and  it  is 
deleted,  the  starting  bit  will  change.  If  any  other  run  of  size  1 
is  deleted,  the  two  neighbouring  runs  will  merge.  For  insertion 
errors,  the  subwords  are  obtained  by  adding  bits  either  to  the 
beginning  or  the  end  of  the  word,  increasing  the  size  of  the 
runs  or  by  splitting  existing  runs. 

Assume  that  binary  words  of  length  n  are  used  and  that  s 
denotes  the  number  of  insertion  and/or  deletion  errors.  Since 
a  binary  word  and  its  complement  have  complementary  sub¬ 
words,  it  is  only  necessary  to  compute  the  subwords  of  2n~ 1 
words.  Complementing  the  starting  bit  of  the  already  calcu¬ 
lated  words/subwords  forms  the  other  2n_1  words/subwords. 
This  method  is  used  to  construct  subword  books  that  contain 
the  subwords  of  all  2n  binary  words  after  s  =  1  errors.  Us¬ 
ing  the  s  =  1  subword  book  and  repeating  the  procedure  on 
all  the  subwords,  a  s  =  2  subword  book  can  be  formed.  By 
searching  the  subword  books,  codewords  can  be  chosen  that 
do  not  have  a  common  subword.  Cardinalities  of  codebooks 
found  by  computer  searching  s  =  2  subword  books  will  be  pre¬ 
sented  and  compared  to  known  s  —  2  correcting  codebooks  by 
Heiberg  [1]. 

By  inspecting  the  subword  books  and  using  generating 
functions,  it  is  possible  to  determine  the  number  of  subwords 
that  a  binary  word  will  produce.  The  number  of  subwords 
after  deletions  is  dependable  on  the  runs  in  the  word.  Let  x 
denote  the  binary  codeword  and  r(x)  be  the  number  of  runs 
in  x.  In  the  case  of  s  —  1  deletions,  r(x)  subwords  will  be 
formed.  Let  A(r,  y)  indicate  the  size  of  the  y-th  run  in  x.  For 
s  =  2  deletions,  the  number  of  subwords  will  be  given  by: 

^(p2 +q2 +  2pq-p  +  q)  ~r  (1) 

where  p  is  the  number  of  A (x,y)  =  1,  q  the  number 
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of  X(x,y)  >  2  and  r  the  number  of  A {x,y)  =  1,  where 
2  <  y  <  t(x)  —  1.  The  number  of  new  words  after  insertions 
is  dependable  on  the  length  of  the  word.  For  s  =  1  insertion 
there  will  be  n  +  2  new  words.  For  s  =  2  insertions  it  is  given 

by: 

i(n2  +  5n  +  8)  (2) 

Because  the  number  of  new  words  for  insertions  is  set,  this 
fact  can  be  used  to  establish  an  upperbound.  According  to 
Levenshtein,  a  code  capable  of  correcting  s  deletions  will  also 
be  able  to  correct  s  deletions  and/or  insertions  [2],  There¬ 
fore  this  insertion  upperbound  provides  an  upperbound  for 
s-correcting  codes  in  general. 

II.  New  proposed  scheme 

We  further  present  a  new  coding  scheme  in  part  based  on  a 
parallel  convolution  encoder.  Insertion/deletion  errors  result 
in  a  long  burst  error  after  the  error  occurred.  This  means  that 
any  bits  received  after  an  insertion/deletion  error  can  not  be 
used  in  error  correcting.  For  this  reason  it  is  proposed  that  en¬ 
coding  proceed  as  normal,  up  to  a  certain  length,  but  that  the 
encoded  data  be  sent  in  reverse  over  the  channel.  This  results 
in  an  encoded  data  stream  that  is  able  to  detect  errors  in  the 
coming  data,  with  the  assumption  that  data  already  received 
is  correct  or  already  corrected  by  the  decoder.  Two  encoders 
with  rates  R  =  |  and  R  =  \  are  presented.  Both  encoders 
are  able  to  correct  insertion,  deletion  or  reversal  errors,  given 
that  the  channel  is  limited  to  one  type  of  error. 

Whenever  an  insertion/deletion  error  occurs  and  the  syn¬ 
drome  indicates  an  error,  a  bit  is  deleted/inserted  in  a  certain 
place  relative  to  the  syndrome  error  and  the  syndromes  recal¬ 
culated.  Since  the  inserted/deleted  bit  will  not  always  be  in 
the  correct  position,  there  is  a  possibility  of  a  short  burst  of 
reversal  errors.  The  new  syndrome  can  then  be  used  to  cor¬ 
rect  these  errors.  In  the  case  of  reversal  errors,  the  syndrome 
can  be  used  as  is  done  for  error  correction. 
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Abstract  —  It  is  well  known  that  Reed-Muller 
(RM)  codes  are  not  an  linear  unequal  error  protec¬ 
tion  (LUEP)  code  because  the  set  of  minimum-weight 
vectors  span  Reed-Muller  codes  (punctured  or  not) 
[1,2].  In  this  paper,  we  showed  that  most  of  RM 
codes  are  LUEP  codes  if  RM  codes  are  encoded  with 
recursively  decomposed  trellis  oriented  generator  ma¬ 
trix  (TOGM)  and  maximum-likelihood  trellis  decod¬ 
ing  (MLTD)  is  used. 

I.  Introduction 

Uneqaul  error  protection  codes  protect  some  information 
bits  against  a  great  number  of  errors  than  other  information 
bits.  LUEP  codes  were  first  introduced  by  Masnick  and  Wolf 
[3].  Boyarinov  and  Katsman  [4]  found  conditions  for  linear 
codes  to  be  LUEP.  Let  C  be  and  ( n,k,d )  linear  code.  It  is 
shown  in  [l]  that  if  the  minimum-weight  vectors  of  a  linear 
code  G  does  not  span  it,  then  C  is  an  LUEP  code.  It  is  well 
known  that  their  set  of  minimum-weight  vectors  span  RM 
codes  (punctured  or  not)  [2].  Therefore,  RM  codes  are  not 
LUEP  codes  in  algebraic  decoding.  In  the  soft-decision  maxi¬ 
mum  likelihood  decoding,  bit-error-rate  of  RM  code  depends 
on  the  weight  distribution  of  code.  If  non-systematic  GM  is 
used  for  encoding  the  RM  code  and  soft-decision  maximum 
likelohood  decoding  is  used  in  decoding,  different  set  of  in¬ 
formation  bits  has  a  different  bit- error-rate  since  each  other 
has  different  weight  distribution.  Therefore,  even  though  RM 
code  is  not  an  LUEP  code  in  algebraci  decoding,  RM  code  is 
an  LUEP  code  in  soft-decision  maximum  likelihood  decoding 
if  systematic  GM  is  not  used  for  encoding. 

Especially,  in  this  paper,  LUEP  RM  codes  are  constructed 
by  using  recursively  decomposed  TOGM  for  encoding.  Sim¬ 
ulations  show  that  bit-error-rate  of  some  information  bits  is 
almost  twice  better  than  that  of  the  other  information  bits. 
By  using  the  recursive  decomposition,  a  simple  trellis  diagram 
with  parallel  structure  for  the  RM  code  is  devised.  In  ML 
trellis  decoding,  information  bits  are  retrieved  directly  from 
the  labeling  of  the  trellis. 

II.  Recursive  Decomposition  of  Reed-Muller 
Codes  and  Its  Trellis 

Let  RM(r,m)  denote  the  r-th  order  binary  RM  code  of 
length  2m[l,2].  This  code  has  minimum  Hamming  distance 
d  =  2m-r  and  the  dimension 

X(r,„)  =  l+( 

Let  T  be  a  (2,1,2)  binary  linear  code  with  following  gen¬ 
erator  matrix  Gt  =  (  1  1  )  •  And  let  W  be  a  (2,2,1) 

binary  linear  code  with  following  generator  matrix  Gw  = 

^  J  ^  .  Let  [RM(r,  m  -  1  )/RM(r  —  1,  to  -  1)]  denotes 

'This  work  was  supported  by  LSI  LOGIC  Corporation. 


the  set  of  representatives  of  the  cosets  of  RM(r  —  1,  to  —  1) 
in  RM(r,  m  -  1)  and  G(r,  r  -  1,  m  -  1)  be  the  generator  ma¬ 
trix  for  the  [RM(r,  m  —  1  )/RM(r  —  l.m  —  1)]  coset  code  and 
E(r,  r  —  1,  to  —  1)  be  the  dimension  of  [RM{r,  m  —  1  )/RM(r  — 

1,  to  —  1)].  Then  the  generator  matrix  for  RM(r,  m)  is  as 
following  G(r,  m)  —  G(r,  r  —  1,  m  —  1)  0  Gt  0  G(r  —  1,  r  — 

2, m-2)0GT®G(y0G(r-2,m-2)  0  Gw  0  Gw  where 
0  and  0  denotes  the  direct  product  and  direct  addition. 
Therefore,  K(r,  m)  =  E(r,  i —  1,  to  —  1)  +  2  X  E(r,  r  —  2  ,  m  — 
2)  +  4  x  K(r  —  2,m  -  2).  Let  K  =  K{r,m)  =  K\  +  K2  +  A3  = 
£(r,  r  —  1,  m  — 1)  +  2  x  E(r,  r  —  2,  to  —  2)  +  4  x  K(r  —  2,  to  —  2). 
Let  Wi ,  IV2,  and  W3  be  weight  distribution  of  G(r,  r  —  1,  to  — 
1)0Gt0,  G(r  —  1,  r  —  2,  to  —  2)0Gt0Gw0,  and 
G(r  —  2,  to  —  2)  0  Gw  0  Gw,  respectively.  Then  bit-error- 
rate  of  K 1,  K2,  and  Kz  information  bits  depend  on  weight 
distribution  of  Wi,  W2 ,  and  Wz  respectively. 

III.  Examples  and  Simulation  Results 
Consider  the  RM( 2,  5)  code  which  is  a  (32,  16)  RM  code  of 
Hamming  distance  8.  Let  b  =  (61 , 62 , • • • ,  6ie)  be  the  16  infor¬ 
mation  bits  and  v  =  (vi ,  V2,  ■  ■  • ,  V32)  be  the  the  corresponding 
codeword  in  RM( 2,5).  Then 

v  =  b  G(2, 5) 

=  (61,62,63,64,65,66)  G(2,l,  4)(g)Gr0 

(67,68,69,612,613,614)  G(2,l,3)  (^)Gt0Gw 

(610 , 611 , 615 , 6ie )  G(0.3)  (^)  Gw  (^)  Gw 

where  On  means  N  consecutive  zeros.  The  first  6  bits, 
61,62,63,64,65,66,  select  one  of  the  64-subtrellises.  For  the 
left  (16,5,8)  code,  (67,68,69)  selects  one  of  the  8-subtrellises 
which  are  of  length  16.  Then  610  selects  a  codeword  in  the 
left  (8,1,8)  code  and  6u  selects  a  codeword  in  the  right  (8,1,8) 
code.  For  the  right  (16,5,8)  code,  (612,613,614)  Selects  one 
of  8  subtrellises  which  are  of  length  16.  Then  615  selects  a 
codeword  in  the  left  (8,1,8)  code  and  61 6  selects  a  codeword 
in  the  right  (8,1,8)  code.  Simulation  results  shows  that  group 
of  610,611,615,616  achieves  about  0.14  dB  coding  gain  over 
groups  of  61 , 62 , 63 , 64 , 65 , 65  and  67 , 6s ,  69 , 612 , 613 , 614 . 
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Abstract  —  This  paper  presents  a  class  of  codes  pro¬ 
tecting  data  on  two-dimensional  symbology  against 
errors  caused  by  stain,  tear,  scratch  or  blurring. 

I.  Introduction 

In  two-dimensional  symbology [1],  nonbinary  character  such 
as  alphabet,  number,  etc.,  is  two- dimensionally  expressed  as 
pattern  of  black  and  white  pixels,  denoted  as  binary  ’1’  or  ’O’, 
in  the  record  media.  These  pixels  are  sometimes  disturbed  by 
stain,  tear,  scratch  or  blurring,  and  these  make  black  pixels 
changed  into  white  ones,  or  vice  versa.  These  disturbances 
result  in  unidirectional  errors[2][3]  in  a  binary  space. 

Each  nonbinary  character  is  usually  expressed  in  a  block 
of  binary  digits  with  fixed  size,  called  byte.  In  some  digital 
systems,  2b  6-bit  byte  patterns  are  not  fully  assigned  to  q- 
ary  characters,  that  is,  the  total  number  of  g-ary  characters 
used  in  the  system,  q.  is  less  than  2h .  Therefore,  the  remaining 
(2b  —  q)  6- bit  byte  patterns,  are  not  used  in  the  systems,  which 
gives  possibility  to  design  efficient  codes. 

This  paper  proposes  a  class  of  codes  for  g-ary  data  which 
can  correct  single  unidirectional  6-bit  byte  errors  in  a  binary 
space,  called  g-ary  single  unidirectional  5-bit  byte  error  cor¬ 
recting  (1-U6EC)  codes,  with  q  <  2h . 

II.  Code  Construction 

Let  a,  c  be  elements  in  Galois  field  GF(pi),  i.e.,  a,  c  £ 
GF(pi),  and  b,d  be  elements  in  Galois  field  GF(p2 ).  i.e., 
b,c  £  GF(p2).  The  set  R(p\ ,  p2 )  with  pi  x  p2  elements  de¬ 
fined  by  the  following  conditions  is  a  ring: 

(1)  <a,b>  £  R(p1,p2), 

(2)  <a,6>  ©  <c,d>  =  <a+i  c,  6 -f2  d>, 

(3)  <a,b>  ®  <c ,d>  =  <a  Xi  c,b  X2  d>, 

where  +;  and  X;  are  additive  and  multiplicative  operations 
between  two  elements  in  GF(pi),  *=1,2,  respectively. 

Theorem  1  Let  H,  be  a  parity  check  matrix  of  an  (n,,n,  — 
r )  systematic  single  error  correcting  code  over  GF(pi ),  where 
i=l,2,  as  shown  below: 

Hi  =  [  hi  h'2  ...  h^  ]  ,H2  =  [  hi'  hi'  ...  h"2  ]  , 
where  h'  =  (a0  ...  ar-i)T,  a,  £  GF(p a),  0  <  l  <  r,  and 
h''  =  (bo  ■ . .  5r_i  )T y  b/  £  GF(p2 ),  0  <  /  <  r.  The  linear  code 
defined  by  the  following  parity  check  matrix  H0  over  R(pi  ,p2) 
is  a  code  capable  of  correcting  single  errors  with  type  <a,/3>. 

Ho  =  [<h;,h">  ...  <h;,h''2>  |  ...  |  <h'nj  ,h">  ...  <h'ni ,h'n'3>  ] 

Here,  <h',h">  (0  <  i  <  m,  0  <  j  <  ra2)  represents  vector 
(<a0,b0>  ...  <ar_i,6r_i>)r.  □ 

The  code  construction  requires  function  f  which  maps  from 
set  V  containing  binary  vectors  with  length  b  to  R(pi  ,p2),  i.e., 
/  :  V  — >  R(pi , p2 ),  satisfying  the  following  three  conditions: 

(0  if /(i)  = /(j),  then  i  =  j, 

(ii)  if  (/(i)  =  <a,  b>)  A  (/(j)  =  <a,  d>)  A  (b^d), 
then  weight  of  i  is  equal  to  that  of  j , 

(iii)  if  (/( i)  =  <o,6>)  A  (/(j)  =  <c,b>)  A  (a  c), 
then  i  and  j  arc  unordered, 

where  i  and  j  are  binary  vectors  each  having  length  b. 


Encoding  Procedure:  The  following  notations  arc  used 
in  the  algorithm  to  construct  g-ary  1-U6EC  codes. 

dp.  g-Ary  character,  1  <  i  <  K. 

<a,.  b,  > :  Information  element,  in  R(p\  ,p2),  l  <  i  <  K. 

<a.j.hj>:  Check  element  in  R(pi  ,p?),  1  <  j  <  R. 

d,:  Binary  information  vector  with  length  b,  1  <  i  <  K. 

p_,:  Binary  check  vector  with  length  b,  1  <  j  <  R. 

f~  :  Inverse  function  of  /. 

g:  One-to-one  function  from  set  of  g-ary  characters  to  set 
{/(x)|Vx  €  V}. 

h:  One-to-one  function  from  f?(pi,p2)  to  set  of  pi  x  p2 
binary  vectors  each  having  length  b. 

Let  (t/j ,  d2, . . . ,  dx)  be  an  input  g-ary  information  vector. 
Under  the  above  preparation,  encoding  is  shown  as  follows: 

1)  Determine  the  function  /  :  V  — *  R(p1,p2),  where  V  has 
q  vectors,  q  <  2h . 

2)  Obtain  information  element  <a,,b,>  by  <a, .  b, >  = 
g(di),  where  1  <  i  <  K. 

3)  Obtain  check  element  <dj,bj  >,  1  <  j  <  R,  which 
satisfies  the  following  equation: 

0  =  (<aj,  6i>, . . . ,  <bx,bK>,  <a2,  fci>, .  . . ,  <aR,  bR>)  ■  Hr, 
where  H  is  an  R  X  (K  +  R)  shortened  matrix  of  H0,  and  0  is 
a  1  x  (K  +  R)  zero  matrix. 

4)  Obtain  d,  =  /_1  (<a,, fc,>)  for  1  <  i  <  K  and  p =  h( 
<aj,bj>)  for  1  <  j  <  R.  Finally,  (di ,  d2 ,  . .  . ,  d*-, pi ,  . ..  ,pR) 
shows  the  encoded  output. 

III.  Evaluation 

Fig  1.  shows  that  the  codes  are  more  efficient  than  the  con¬ 
ventional  codes  which  can  correct  single  unidirectional  byte 
errors  with  q  =  2h,  i.e.,  26-ary  1-U6EC  codes[3]  and  the  single 
symmetric  byte  error  correcting  codes  [2]. 


Information  Byte  Length(it) 

Fig.l:  Relation  between  the  information  byte-length  and  the 

check  bit-length  for  g-ary  1-U5EC  codes,  where  6  =  6. 
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Abstract  —  Wiberg  et  al.  [6]  proposed  graphical  code 
realizations  using  three  kinds  of  elements:  symbol 
variables,  state  variables  and  local  constraints.  We 
focus  on  normal  realizations,  namely  Wiberg-type  re¬ 
alizations  in  which  all  symbol  variables  have  degree  1 
and  state  variables  have  degree  2. 

A  natural  graphical  model  of  a  normal  realization 
represents  states  by  leaf  edges,  states  by  ordinary 
edges,  and  local  constraints  by  vertices.  Any  such 
graph  may  be  decoded  by  message-passing  (the  sum- 
product  algorithm). 

We  show  that  any  Wiberg-type  realization  may  be 
put  into  normal  form  without  essential  change  in  its 
graph  or  its  decoding  complexity. 

Group  or  linear  codes  are  realized  by  group 
or  linear  realizations.  We  show  that  an  appropriately 
defined  dual  of  a  group  or  linear  normal  realization 
realizes  the  dual  group  or  linear  code.  The  symbol 
variables,  state  variables  and  graph  topology  of  the 
dual  realization  are  unchanged,  while  local  constraints 
are  replaced  by  their  duals. 

I.  Summary 

Tanner  [5]  founded  the  subject  of  “codes  on  graphs,”  build¬ 
ing  on  Gallager’s  work  on  low-density  parity-check  (LDPC) 
codes  [2],  A  “Tanner  graph”  is  a  bipartite  graph  in  which 
there  are  two  types  of  vertices,  representing  symbol  variables 
and  local  constraints  ( e.g parity  checks).  Tanner  also  devel¬ 
oped  the  algorithm  now  generically  known  as  the  “message¬ 
passing”  or  “sum-product”  algorithm  for  decoding  codes  on 
graphs,  generalizing  Gallager’s  APP  (a  posteriori  probability) 
decoding  algorithm,  and  proved  that  this  algorithm  performs 
exact  APP  decoding  on  arbitrary  cycle-free  graphs. 

Wiberg  et  al.  [6]  made  an  important  advance  by  introduc¬ 
ing  a  third  type  of  vertex,  representing  state  variables.  They 
thus  made  connections  with  trellis  representations  of  codes, 
and  with  turbo  codes  and  turbo  decoding  algorithms.  Since 
this  work,  “codes  on  graphs”  have  become  the  common  intel¬ 
lectual  foundation  for  the  study  both  of  moderate-complexity 
codes  such  as  traditional  block  and  convolutional  codes,  and 
of  capacity-approaching  codes  such  as  turbo  codes  and  LDPC 
codes  [1,  3].  The  more  powerful  codes  are  based  on  graphs 
with  cycles;  their  graph-based  decoding  algorithms  have  been 
shown  empirically  to  work  very  well,  even  though  few  theo¬ 
rems  are  known  for  graphs  with  cycles. 

In  this  paper,  we  consider  Wiberg-type  realizations  in 
which  symbol  variables  and  state  variables  are  restricted  to 
degrees  1  and  2,  respectively,  called  normal  realizations.  We 
show  that  such  a  restriction  involves  no  loss  of  generality  nor 
increase  in  graphical  or  decoding  complexity.  With  this  re¬ 
striction,  we  are  able  to  prove  a  powerful  and  general  duality 
theorem  which  applies  to  group  or  linear  graphical  models  of 
arbitrary  topology —  in  particular,  to  graphs  with  cycles. 


A  Wiberg-type  realization  [6]  is  based  on  a  set  of  symbol 
variables  {Ak,k  S  I  a  } ,  a  set  of  state  variables  {Sj,j  &  Is}, 
and  a  set  of  local  constraints  {Ci,i  €  Ie},  constraining  some 
subset  of  the  variables.  The  realization  generates  a  code  C 
consisting  of  all  symbol  configurations  a  that  occur  as  part  of 
some  global  symbol/state  configuration  (a,  s)  that  satisfies  all 
local  constraints.  In  the  linear  or  group  case,  each  variable 
is  a  vector  space  or  group,  the  local  constraints  are  linear  or 
group  codes,  and  the  code  C  is  then  a  linear  or  group  code. 

The  degree  of  a  variable  is  the  number  of  local  constraints 
in  which  it  is  involved.  A  Wiberg-type  realization  is  normal 
if  the  degree  of  each  symbol  variable  is  1  and  of  each  state 
variable  is  2.  For  example,  a  conventional  state  realization 
(trellis)  has  local  constraints  corresponding  to  trellis  sections 
that  involve  triples  (sk,ak,Sk+i),k  6  Z,  and  thus  is  normal. 

A  normal  realization  is  naturally  represented  by  a  normal 
graph  consisting  of  degree-1  leaf  edges  representing  symbol 
variables  Au,  degree-2  ordinary  edges  representing  state  vari¬ 
ables  Sj,  and  vertices  representing  local  constraints  Ct .  An 
edge  is  connected  to  a  vertex  if  the  corresponding  variable  is 
involved  in  the  corresponding  local  constraint. 

It  is  easy  to  show  that  any  Wiberg-type  realization  may 
be  converted  to  a  normal  realization  by  replicating  variables. 
The  normal  graph  of  the  resulting  realization  looks  essentially 
the  same  as  the  Wiberg-type  graph  of  the  original  realization, 
and  may  be  decoded  with  the  same  complexity. 

The  dual  realization  of  a  group  or  linear  normal  realiza¬ 
tion  is  the  realization  in  which  each  variable  is  replaced  by  its 
character  group  (the  same  variable,  if  its  alphabet  is  finite), 
each  local  code  is  replaced  by  its  dual  code,  and  a  sign  in¬ 
verter  is  inserted  in  each  ordinary  edge.  We  prove  that  a  dual 
realization  realizes  the  dual  group  or  linear  code,  regardless 
of  the  topology  of  the  associated  normal  graph.  This  result 
greatly  generalizes  Mittelholzer’s  result  [4]  for  dual  trellises, 
and  shows  that  the  dual  of  any  code  may  be  realized  by  use 
of  the  same  graph  and  same  state  spaces  as  the  primal  code. 
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Abstract  —  The  graphical  representation  of  codes 
has  opened  the  way  to  soft  decoding  by  belief  propa¬ 
gation  (BP),  which  extends  the  usual  soft  Viterbi  de¬ 
coding.  This  simple  algorithm  is  most  often  used  for 
constructing  and  evaluating  graphical  codes.  We  show 
that  belief  propagation  on  graphs  is  not  always  appro¬ 
priate  and  that  the  algorithmic  resources  for  graphical 
models  are  far  more  extended  than  BP.  In  particu¬ 
lar,  we  propose  new  approximate  decoders  based  on 
the  “conditioning  technique”  to  solve  the  short  cycles 
problem  of  graphical  codes. 

I.  Introduction 

On  graphical  representations,  turbo-decoding  is  equivalent 
to  the  belief  propagation  (BP)  algorithm  [1].  BP  on  graphs 
converges  to  the  exact  posterior  marginals  as  long  as  the  graph 
has  a  tree  structure  [3].  Surprisingly,  the  algorithm  still  pro¬ 
vides  a  good  approximation  of  posterior  marginals  even  under 
the  presence  of  cycles,  as  turbo-codes  have  revealed.  This 
holds  in  particular  when  the  graph  has  “long”  cycles  since, 
around  a  given  variable,  it  can  be  well-approximated  by  a 
tree :  measurements  too  far  away  from  a  given  node  have  lit¬ 
tle  influence  on  this  node. 

The  graphical  construction  of  codes  and  decoders  looks 
very  promising,  but  it  may  be  somewhat  misleading  however, 
because  the  construction  relies  on  a  single  algorithm,  which 
induces  a  confusion  between  the  properties  of  the  code  itself, 
and  those  of  the  decoding  algorithm.  A  “good”  graphical  code 
can  be  understood  as  a  structure  providing  the  highest  de¬ 
gree  of  protection  to  each  bit.  This  suggests  high  correlations 
between  variables  of  the  graph,  so  that  many  measurements 
bring  information  on  each  bit.  This,  in  turn,  suggests  “com¬ 
pact  graphs”  containing  many  short  cycles.  But  such  graphs 
are  precisely  those  for  which  BP  is  not  expected  to  work  well. 
This  may  explain  why  good  graphical  codes  found  up  to  now 
usually  rely  on  large  graphs. 

However,  bayesian  estimation  for  graphical  models  starts 
with  BP,  but  also  provides  a  wide  range  of  techniques  to  deal 
with  cyclic  graphs.  In  particular,  exact  computations  can  be 
performed  despite  the  presence  of  cycles.  The  price  to  pay 
is  an  increased  complexity  of  the  algorithm.  The  most  inter¬ 
esting  point  is  that  exact  and  approximate  methods  can  be 
mixed,  which  allows  us  to  tune  the  trade-off  between  com¬ 
plexity  and  precision. 


of  such  a  code,  seen  at  two  different  scales.  One  can  imagine 
two  algorithms  for  decoding  this  (26,8)  code:  either  BP  on 
the  fine  scale  graph  (-b-),  or  BP  on  the  coarse  scale  graph  (-a-), 
i.e.  the  tetrahedron.  Simulation  results  show  that  both  algo¬ 
rithms  converge  rapidly,  but  the  second  one  is  much  better. 
This  phenomenon  reveals  that  correlation  between  variables 
of  the  graph  plays  a  central  role  in  the  performance  of  an  esti¬ 
mation  algorithm,  and  in  particular  that  short  cycles  perturbs 
BP  very  much. 

III.  Dealing  with  short  cycles  :  beyond  belief 

PROPAGATION 

Conditioning.  Markov  field  theory  explains  a  simple  and 
elegant  result :  conditionally  to  a  given  variable  Xa  in  the  field, 
the  remaining  variables  still  obey  a  Markov  field,  the  graph 
of  which  is  obtained  simply  by  removing  vertex  Xa  from  the 
original  graph.  Let  us  consider  a  graphical  model  composed 
of  one  cycle  only.  Removing  one  vertex  in  the  graph  Opens  the 
cycle,  which  yields  a  simple  Markov  chain  structure,  that  is 
amenable  to  exact  estimation  through  BP.  This  is  the  basis  of 
the  conditioning  method,  the  originality  of  which  is  to  propose 
a  way  of  properly  handling  the  variable  that  has  been  removed. 

Approximate  conditioning.  One  interesting  aspect  of 
the  conditioning  method  is  to  offer  an  alternate  solution  to  the 
agregation  procedure,  which  gets  back  to  a  tree  (the  “junction 
tree”)  by  grouping  variables.  However,  the  overall  complexi¬ 
ties  of  both  methods  are  similar  in  many  cases.  But  condition¬ 
ing  has  another  interesting  point :  it  leads  to  new  approximate 
decoding  algorithms  that  mix  the  conditioning  method  with 
approximate  BP  on  graphs  with  cycles.  The  idea  is  to  break 
only  part  of  the  cycles,  and  in  particular  short  cycles,  in  order 
to  obtain  a  simplified  graph  on  which  belief  propagation  will 
perform  well.  This  simple  strategy  gives  excellent  results,  at 
low  cost,  on  graphical  codes  that  resist  the  BP  algorithms. 

References 


II.  Experimental  framework  :  two-scale  codes 

There  is  an  easy  way  of  augmenting  the  compacity  of  a 
graphical  code  at  low  price,  without  disturbing  too  much  its 
apparent  structure  (cf  figure) :  simple  parity  bits  can  be  re¬ 
placed  by  an  ordinary  algebraic  code,  whence  the  name  two- 
scale  codes  (cf  Tanner  in  [2]).  Re-expanding  the  coarse  scale 
structure  to  evidence  each  bit  reveals  that  many  cycles  have 
been  introduced  in  the  fine  scale.  The  figure  gives  an  example 
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Abstract  —  In  a  graph  G  =  (V,  E),  a  subset  of  ver¬ 
tices  C  (=  code)  is  called  f-identifying  if  for  all  v  £  V 
the  sets  Bt{v)(~\C  consisting  of  all  elements  of  C  within 
distance  t  from  v  are  nonempty  and  different.  We 
study  some  properties  of  these  codes. 

I.  Introduction 

Let  G  =  ( V ,  E)  be  an  undirected  connected  graph  (finite  or 
infinite).  We  denote  by 

Bt(v)  —  {x  €  V  :  d(x,v)  <  t} 

the  ball  of  radius  t  centred  at  the  vertex  v  £  V,  where  d(x,  v) 
equals  the  number  of  edges  in  a  shortest  path  between  v  and  x. 
If  d(x,  t>)  <  t,  then  we  say  that  x  covers  v  (and  vice  versa). 

A  code  C  is  a  nonempty  subset  of  V .  Its  elements  are 
called  codewords.  The  code  C  is  a  t-identifying  code  if  the 
sets  Bt(v)  n  C,  v  6  V,  are  all  nonempty  and  different. 

This  definition  is  motivated  by  fault  diagnosis  in  multipro¬ 
cessor  systems:  a  multiprocessor  system  can  be  modeled  as 
an  undirected  graph  where  the  vertices  are  processors  and  the 
edges  the  links  in  the  system.  For  testing  the  system  and 
locating  one  faulty  processor,  a  set  of  processors  is  selected 
and  each  selected  processor  is  assigned  the  task  of  testing  the 
vertices  within  distance  t,  for  malfunction.  Whenever  it  de¬ 
tects  a  fault  of  any  kind,  an  error  message  is  issued,  specifying 
only  its  origin.  The  minimum  number  of  selected  processors 
needed  is  the  minimum  size  of  a  t-identifying  code. 

II.  A  New  Lower  Bound  for  Infinite  Grids 

We  focus  on  the  following  four  infinite  2-dimensional  grids: 

-  the  square  grid,  G i; 

-  the  square  grid  with  one  diagonal  (or  triangular  grid),  G 2; 

-  the  square  grid  with  two  diagonals,  G3; 

-  the  hexagonal  grid,  G\. 

A  simple  lower  bound  (see  [12])  states  that  the  smallest 
possible  density  d[^  of  a  t-identifying  code  in  Gi  (i  =  1,2, 3, 4) 
satisfies 

£*(■)  > _ - _ 

where  B ^  denotes  the  size  of  a  ball  of  radius  t  in  G,  (size 
independent  of  the  centre  of  the  ball).  Since  for  i  —  1,2, 3,  4, 
these  sizes  are  given  by  polynomials  of  the  second  degree  in  t, 
we  have  a  lower  bound  on  the  density  in  D(t-2).  For  the  four 
grids,  we  improve  this  to  fi(t-1). 

III.  Nonexistence  of  Perfect  Codes  for  t  >  1 

A  perfect  t-identifying  code  is  such  that  all  codewords  are 
covered  only  by  themselves,  and  all  non  codewords  are  covered 


by  exactly  two  codewords.  A  perfect  1-identifying  code  in  Gi 
is  given  in  [12]. 

We  prove  that  in  any  graph,  no  nontrivial  perfect  t- 
identifying  code  exists  unless  t  =  1. 

IV.  Complexity 

We  prove  that  the  following  problem  is  NP-complete: 
INSTANCE:  a  graph  G  =  (V,  E ),  an  integer  k; 

QUESTION:  is  there  a  1-identifying  code  C  C  V  of  size  at 
most  fc? 
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Abstract  —  Randomized  constructions  are  presented 
for  a  family  of  linear-time  encodable  and  decod¬ 
able  error-correcting  codes  using  irregular  expander 
graphs.  These  codes  can  be  encoded  in  constant  time 
and  decoded  in  at  most  logarithmic  time  if  a  linear 
number  of  processors  are  used. 

I.  Introduction 

We  construct  a  family  of  linear-time  encodable  and  de¬ 
codable  error-correcting  codes.  These  codes  can  also  be 
encoded  by  circuits  of  linear-size  and  constant  depth  and 
decoded  by  circuits  of  linear-size  and  at  most  logarithmic 
depth.  The  size  of  a  circuit  is  defined  as  the  number  of 
vertices,  while  the  depth  of  a  circuit  is  defined  as  the  max¬ 
imum  length  of  a  directed  path  in  the  circuit.  The  use 
of  irregular  expanders  is  motivated  by  a  recent  indication 
that  irregular  graphs  give  better  decoding  performance 
than  regular  graphs  [2]. 

II.  Error  Reducing  Codes 
We  refer  to  message  nodes  as  left  nodes  and  check  node 
as  right  nodes.  We  will  further  use  x  and  c  to  represent 
left  and  right  nodes,  respectively. 

Definition  1  A  code  TZ  of  rn  message  bits  and  (1  —  r)n 
check  bits  is  an  error  reducing  code  of  rate  r,  error  reduc¬ 
tion  e,  and  reducible  distance  5  if  there  exists  an  algorithm 
that,  given  an  input  word  that  differs  from,  a  codeword 
w  E  IZ  in  at  most  fi  <  5n  message  bits  and  v  <  6n  check 
bits,  outputs  a  word  that  differs  from  w  in  at  most  ev 
messages  bits. 

Definition  2  A  bipartite  graph  is  an  (ot,0)  expander  if 
any  subset  S  consisting  of  at  most  a  fraction  a  of  left 
nodes  has  at  least  /3|<5(5)|  right  node  neighbors,  where 
5(S)  is  the  set  of  edges  attached  to  nodes  in  S. 

We  will  sometimes  refer  to  an  (a,  0)  expander  of  rn  left 
nodes  and  (1  -  r)n  right  nodes  as  an  (rn,  (1  -  r)n,a,0) 
expander. 

Theorem  3  If  B  is  an  irregular  (a,  ~  +  d  2  .  )  expander 
where  dx<rnin  is  the  minimum  degree  on  the  left  nodes  of 
B,  then  TZ(B)  is  an  error  reducing  code  of  error  reduc¬ 
tion  |  and  reducible  distance  — ,  where  dXttnaT  is  the 
maximum  degree  on  the  left  nodes  of  B. 

Theorem  4  If  B  is  an  irregular  (a,  ^  +  d  3  )  expander 

and  dx^miri  >  gdx^mQx  where  dx^rnin  and  dx,rnax  are  the 
the  minimum,  and  maximum  degrees  on  the  left  nodes  of 
B,  then  IZ(B)  is  an  error  reducing  code  of  error  reduction 
|  and  reducible  distance 

1This  work  was  supported  by  NSF  Grant  NCR-9725‘251. 


III.  Encoding  and  Decoding 

The  cascading  method  that  we  use  in  our  construction 
was  originally  developed  by  Luby  et  al.  for  the  construc¬ 
tion  of  erasure  codes  [1].  Let  each  graph  in  the  set  {£?,}  of 
irregular  expander  graphs  have  alk  left  nodes  and  ar+1k 
right  nodes.  We  associate  each  graph  with  an  error  re¬ 
ducing  code  7 Z(Bf)  that  has  a'k  message  bits  and  al+lk 
check  bits,  0  <  i  <  m.  We  also  use  an  error  correcting 

code  C  that  has  am+1k  message  bits  and  T - -  check 

bits.  To  decode  C(B0,  ■  ■  • ,  Bm,C),  we  simply  decode  the 
individual  codes  IZ(B0),  ■  ■  • ,  IZ(Bm),  C  in  reverse  order. 
By  choosing  a  code  C  that  can  be  encoded  and  decoded 
in  quadratic  time  and  choosing  m  such  that  am+1k  «  \fk, 
we  insure  that  the  code  C(Bq,  ■  •  ■ ,  Bm,C)  can  be  encoded 
and  decoded  in  linear  time. 


Theorem  5  Let  B ,  be  an  irregxdar  (alk,al+lk,ot,j  + 
— )  expander  where  dX)min  is  the  minimum  degree  of 
the  left  nodes  of  B ,,  0  <  i  <  m.  Let  C  be  an  error 
correcting  code  of  arn+lk  message  bits  and  - 1— check 
bits,  am+1k  «  Vk,  that  can  correct  a  random  — 
fraction  of  errors,  where  dX)max  is  the  maximum  degree 
of  the  left  nodes  of  a  Bt.  Then  C(B0,  •  •  • ,  Bm,C)  is  a  rate 
1  —  cv  error- correcting  code  that  can  be  encoded  in  linear 
time  and  can  correct  a  random 
in  linear  time. 


2d, 


fraction  of  errors 


Theorem  6  Let  Bt  be  an  irregular  (atk,a'l+1k,a,  yb  + 
Yj  1  )  expander,  dx,miri  QdXlrnnx ,  where  dx,rnin  and 

dx,max  are  the  minimum  and  maximum,  degrees  of  the  left 
nodes  of  Blr  0  <  i  <  m.  Let  C  be  an  error  correcting  code 
of  am+1k  message  bits  and  check  bits,  am+lk  « 

Vk,  that  can  correct  a  random  ^  fraction  of  errors.  Then 
C(Bo,  -  •  • ,  Bm,C)  is  a  rate  1  —  a  error- correcting  code  that, 
can  be  encoded  by  a  linear-size  circuit  of  constant  depth 
and  can  correct  a  random  ^  fraction  of  errors  in  a  linear- 
size  circuit  of  at  most  logarithmic  depth. 
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Abstract  —  Simple  algorithm  constructing  search 
trees  for  the  given  set  of  binary  words  is  presented. 
It  is  shown  that  the  average  cost  of  result  of  this  al¬ 
gorithm,  and,  hence,  the  average  cost  of  the  optimum 
search  tree  is  near  to  their  natural  lower  bound. 

I.  Introduction 

The  problem  of  construction  of  a  binary  search  tree  for  any 
set  of  binary  words  has  wide  applications  in  computer  science, 
biology,  mineralogy,  etc.  Construction  of  a  tree  of  minimum 
cost  has  attracted  attention  of  many  authors  [1],  [2],  [3].  It 
is  known  to  be  an  NP-hard  problem  [4],  therefore  the  prob¬ 
lem  arises  to  find  simple  algorithms  for  constructing  nearly 
optimum  trees.  We  show  in  this  paper  that  there  is  a  simple 
algorithm  to  construct  search  trees  which  are  sufficiently  close 
to  the  optimum  tree  on  average.  By  means  of  this  algorithm 
we  prove  that  for  the  optimum  tree  the  average  number  of  bits 
to  be  checked  is  near  to  its  natural  lower  bound,  i.  e.,  the  bi¬ 
nary  logarithm  of  the  number  of  given  words:  their  difference 
is  less  than  1.04  bit. 


II.  Statement  of  the  Problem  and  the  Main 
Result 

Let  a  set  of  m  binary  words  of  length  n,  (m  >  0,  n  >  1) 
be  given.  Let  us  define  the  cost  of  the  search  tree  L  by  the 
equality  C(L)  =  A  where  L,  is  the  number  of  bits 

required  for  identification  of  the  i-th  word. 

We  denote  by  iSn,m  the  set  of  the  initial  data,  i.  e.  the  col¬ 
lection  of  all  sets  of  m  binary  words  of  length  n  (n>  log2  m) . 

Now  let  us  assume  that  an  algorithm  F  builds  a  tree  F(S) 
from  the  set  S  G  Sn,m  •  As  we  will  further  consider  randomized 
algorithms,  it  will  be  convenient  to  denote  by  C(F(S ))  the 
expectation  of  the  cost  of  the  tree  C(F(S))  related  to  the 
measure  given  by  the  considered  algorithm.  Let  us  define  now 
the  average  cost  tn:m(F)  of  the  algorithm  F  as  follows: 


t„,m{F)  - 


1 

CardSn.m 


Y  C(F(S)), 


where  Card  Sn,m  means  the  cardinality  of  the  set  Sn,m- 
Now  we  consider,  perhaps,  the  simplest  randomized  algo¬ 
rithm  of  construction  of  a  search  tree,  which  will  be  denoted 
by  R.  Its  work  can  be  described  as  follows. 

Description  of  the  algorithm  R  This  algorithm  makes  a 
binary  search  tree  from  an  arbitrary  set  of  m  binary  words 

^his  work  was  supported  by  RFBR  Grant  98-01-00772. 


of  length  n.  If  the  given  set  contains  only  one  word  then  the 
algorithm  returns  the  simplest  tree  consisting  of  one  leaf  and 
stops. 

Otherwise,  the  randomly  chosen  position  is  brought  into 
correspondence  with  the  root  of  the  tree.  For  each  of  the 
parts,  into  which  this  check  divides  the  entire  set  of  words, 
the  search  tree  is  constructed  by  the  same  method. 

The  main  result  of  this  paper  is  the  following  theorem: 

Theorem  1  For  the  average  cost  of  the  algorithm  R  the  fol¬ 
lowing  inequality  holds: 

29  log2(2m)  ... 

tn,m(R)  F  l0g2  m  +  —  —  •  (1) 

From  this  result  the  following  corollary  is  readily  deduced: 

Corollary  1  Let  Fopt  be  the  algorithm  which  builds  an  opti¬ 
mum  tree  for  each  data  set.  Then 

.  ■  29  log2(2ra) 

tn,m(Fopt)  F  log2  m  +  —  —  . 

Zo  TYl 

The  following  corollary  contains  the  estimate  for  the  cost 
of  the  search  tree  constructed  by  R  for  almost-all  data  sets, 
instead  of  the  average  estimate.  It  is  an  obvious  consequence 
from  the  Markov-Chebyshev  inequality. 

Corollary  2  Let  us  assign  equal  probability  to  every  set  S 
from  the  set  of  initial  data  Sn,m  (m  >  2,  n  >  log2  m).  Then 
for  every  e  >  0  the  inequality  holds: 

P{S :  C(R(S))  <  (1  +  e)  log2  m)  >  1  - 

The  same  estimate  evidently  holds  for  the  cost  of  the  opti¬ 
mum  search  tree. 
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Abstract  —  Binary  prefix-free  codes  in  which  all 
codewords  end  with  “1”  are  considered.  A  recursion 
is  given  to  construct  all  “optimal”  1-ended  codes  and 
to  compute  the  number  of  such  codes  with  n  code¬ 
words. 

I.  Introduction  and  Definitions 

The  problem  of  finding  an  optimal  D-ary  prefix-free  code  for 
coding  a  source  with  finite  output  alphabet  and  known  output 
probabilities  has  been  solved  by  Huffman  [4],  In  [1],  Berger 
and  Yeung  considered  the  same  problem  restricted  to  binary 
codes  whose  codewords  all  end  with  a  “1” .  As  all  codes  with 
the  same  multiset  of  codeword  lengths  are  equivalent  and  form 
an  equivalence  class,  it  is  enough  to  look  at  only  one  code  in 
each  class.  Berger  and  Yeung  found  a  family  of  classes  called 
potential  classes,  which  contains  all  optimal  codes.  In  [2], 
Capocelli  et  al.  restricted  the  family  of  classes  in  which  all 
optimal  codes  must  lie  to  the  e-potential  classes.  Golin  and 
Chan  [3]  found  a  polynomial-time  algorithm  for  finding  the 
best  one-ended  code  for  a  given  probability  distribution. 

Our  contribution  is  to  determine  the  family  of  optimal 
codes  exactly.  We  also  give  a  method  to  compute,  for  any 
n  >  1,  the  number  of  optimal  classes  of  codes  with  n  code¬ 
words. 

We  consider  probabilities  in  non-increasing  order  and  col¬ 
lect  them  into  a  probability  vector  p  =  (pi,. .  .,p„).  A  code 
with  codeword  lengths  w\  <  •  •  •  <  wn  has  length  vector  w  = 
(wi, . . . ,  wn)  and  multiplicity  vector  x  =  (xi,...,x  max,-  )  j 
where  Xi  is  the  number  of  codewords  of  length  i.  Length  vec¬ 
tors  and  multiplicity  vectors  determine  each  other  uniquely. 
Our  optimality  criterion  is  the  following. 

For  length  vectors  w  and  v  with  n  components,  w  is  better 
than  v  if  J27=i  WiP'  —  Ec=i  v'Pi  f°r  probability  vectors  p 
and  if  there  is  at  least  one  probability  vector  for  which  equality 
does  not  hold.  This  defines  a  partial  ordering.  A  code  with 
length  vector  w  is  better  than  a  code  with  length  vector  v  if 
w  is  better  than  v.  A  multiplicity  vector  x  (corresponding 
to  a  length  vector  w)  is  better  than  a  multiplicity  vector  y 
(corresponding  to  a  length  vector  v)  if  w  is  better  than  v. 

A  length  vector  is  optimal  if  there  is  no  better  length  vector 
of  the  same  length.  Optimal  multiplicity  vectors  and  optimal 
codes  are  defined  accordingly. 

II.  All  Optimal  Multiplicity  Vectors 

Theorem  1:  Let  f(x i,...,xn)  =  Er=ix*2-'  +  xn2~n.  A 
multiplicity  vector  is  optimal  if  and  only  if  it  has  one  of  the 
following  forms  ( X  is  a  binary  string  that  can  be  empty): 

•  (A ,a,b,b,b)  with  b  —  a  >  2  even  and  f(X ,a,b)  =  1 ; 

•  (X,a,b,b,b  —  1)  with  b  —  a>  2  even  and  f(X,a,b)  =  1; 

•  (X ,a,a,a,b)  with  1  <  b  <  a  and  f(X,a)  =  1. 

'This  work  was  performed  while  the  author  was  with  the  Sig¬ 
nal  and  Information  Processing  Laboratory,  ETH  Zurich,  Zurich, 
Switzerland. 


Theorem  2:  From  the  optimal  multiplicity  vector  (1, 1, 1, 1), 
the  following  operations  on  multiplicity  vectors  allow  to  con¬ 
struct  all  optimal  multiplicity  vectors;  moreover,  the  construc¬ 
tion  is  unique  in  the  sense  that  every  optimal  multiplicity  vec¬ 
tor  can  be  constructed  by  only  one  sequence  of  operations: 

•  (X,a,a,a,b)i — t  (X ,  a,  a,  a,  b  +  1)  (1  <  b  <  a  —  1); 

•  (X,a,b,b,b)  i — t  (X,a,b,  b,  b,  1)  and  i — >  ( X,a  — 
1, 6  +  1,  6  +  1,  b)  (b  —  a  >  0  even,  a  >  1); 

•  (X,0,  b,  b,  b)  t — ►  (X,0,b,  b,  b,  1)  (b  >  2  even); 

•  (X,a,b,  b,  b  —  1)  i — >  (X,a,b,  6,6)  (6  —  a  >  2  even); 

Corollary:  Denote  by  A(n)  the  number  of  optimal  multi¬ 
plicity  vectors  whose  components  sum  to  n.  Then  A(n)  = 
Eo<a<n/3  £i<(><«/3  9 K  a,  &)>  where  g  behaves  as  follows:  for 
1  <  n  <  4,  g(n,a,b)  =  1  if  and  only  if  a  —  b  —  1  and 
g(n,a,b)  =  0  otherwise.  For  n  >  5,  g  satisfies  the  following 
recursions: 

1.  g(n,b,  1)  =  2  g(n  —  l,o,  6)  (6  >  1); 

a=0 

b— a  even 

2.  g(n,  a,  b)  =  g(n  —  1,  a,  b  —  1)  (2  <  6  <  a); 

3.  g(n,  a,  b)  =  g(n  —  1,  a,  b  —  1)  (6  —  a  >  2  even,  a  >  0); 

4.  g(n,  a,  b)  =  g(n  —  1,  a  +  1, 6)  (6  —  a  >  1  odd,  a  >  0). 

The  table  below  gives  the  first  values  of  A(n). 


n 

A{n) 

n 

A{n) 

n 

A{n) 

n 

A{n) 

1 

1 

11 

13 

21 

174 

31 

1574 

2 

1 

12 

17 

22 

219 

32 

1929 

3 

1 

13 

23 

23 

278 

33 

2362 

4 

1 

14 

30 

24 

348 

34 

2881 

5 

2 

15 

39 

25 

437 

35 

3511 

6 

3 

16 

50 

26 

544 

36 

4264 

7 

4 

17 

65 

27 

678 

37 

5174 

8 

5 

18 

83 

28 

839 

38 

6258 

9 

7 

19 

107 

29 

1039 

39 

7560 

10 

9 

20 

136 

30 

1279 

40 

9107 

Tab.  1:  The  number  A(n)  of  optimal  length  vectors  of  length  n. 
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Abstract  —  We  study  the  asymptotic  growth  of  or¬ 
dered  trees,  and  give  important  insights  in  coding 
of  trees  from  the  information  theoretic  viewpoint. 
Specifically,  we  give  the  optimal  length  function  in  a 
sense  that  the  Kraft  inequality  is  satisfied  with  equal¬ 
ity.  It  will  be  revealed  that  the  commonly  used  pre¬ 
order  coding  for  special  classes  of  trees  are  asymptot¬ 
ically  tight,  but  not  always  for  many  of  trees. 

I.  fc-ARY  TREES  AND  GENERALIZED  CATALAN  NUMBERS 

For  fc  >  2  we  define  a  k- ary  tree  T  as  follows:  either  T  is 
empty  or  it  has  a  specific  node  called  its  root  that  is  connected 
to  Ti,T2,.  . .  ,Tk,  each  of  which  is  a  Cary  tree.  We  denote  by 
the  set  of  all  Cary  trees  with  m  internal  nodes.  The 
cardinality  c/,-,m  of  7/(m'  is  known  as  the  generalized  Catalan 
number, 

1  /  km  +  1  \  m 

Ck'm  ~  k^Tl  V  m  J  '  (  ) 

Although  each  Cary  tree  having  m  internal  nodes  is  often 
identified  with  a  binary  pre-order  prefix  sequence  of  length 
km  +  1,  the  following  theorem  suggests  the  existence  of  more 
efficient  code  for  Cary  trees  when  k  is  greater  than  two. 

Theorem  1  [1]  For  k  >  2,  we  have 

OO 

y  '  Ck  m2~{9(Cm+log^C(*--1))}  _  (2) 


Then,  the  coefficient  u k,n  is  given  by  using  the  generalized 
Catalan  numbers, 


E  1 

l-f  >  k:n;=n 
£— '/  =  1 


n\,ri2,  ■  ■  ■  ,ns,n  —  rq 


Each  term  in  the  above  sum  (5)  represents  the  number  of  trees 
which  have  n,  internal  nodes  having  ki  outgoing  branches  for 
i  =  1, . . . ,  s. 

The  next  theorem  answer  the  size  of  which  term  in  the 
above  sum  is  maximum. 


Theorem  2 

.  1  +  ukl  +  uk2  +  •  •  •  +  uk 


H(p(„pi . p„) 


u>o  u  y"‘  ,,.=1 

(6 

where  H(p0,pi,.  ■  ■  ,ps)  =  YUo  ~P*  log Pl  is  the  entroPy- 
III.  Optimum  length  function  for  k-ARY  tree 


Setting 


1  1  +  uk>  +  ■  ■  ■  +  uk< 


K  P 


where  g(k)  =  k\og2k  —  (k  —  l)log2(fc  —  1)  =  kh(l/k)  and 
h(p)  =  — p  log2  p  -  (1  -  p)log2(l  -  p)  is  the  binary  entropy 
function. 

II.  k-ARY  TREES 

Let  us  extend  the  results  of  the  Cary  tree  in  the  previous 
section  to  that  of  the  k-ary  tree,  where  k  =  (fci ,  fc2,  •  •  • ,  ks)  is 
a  vector  of  positive  distinct  integers. 

Thus,  a  k-ary  tree  T  is  recursively  defined  either  to  be  a 
single  node  (leaf)  or  to  have  a  specific  node  called  its  root  that 
is  connected  to  Ti,T2,  . . . , Tk;  for  some  C,  each  of  which  is  a 
k-ary  tree.  We  denote  by  the  set  of  all  k-ary  trees  with 
n  nodes,  including  both  external  and  internal  nodes  together. 

From  the  symbolic  consideration,  it  can  be  deduced  that 
the  generating  function 


U  =  Uk(z)  =  j\k.nzn 


1  +  ukl  +  uk 2  -I-  ■  •  •  +  uk,<  =  kiukl  -I-  k2Uk2  +  •  •  •  +  ksuks ,  (8) 

we  can  deduce  from  analytical  considerations  that  p  is  the 
dominant  positive  singularity  of  Uk(z),  and 


Uk(p)  = 


That  is,  we  have 


Theorem  3 


-{(logK)ll  +  log  U) 


satisfies  the  following  functional  equation: 

U  =  z  +  zUk'  +  zUkl •  +  •  ■  ■  +  zUk  ,  (4) 

where  uk,n  is  the  cardinality  of  Tk"\  that  is,  the  number  of 
k-ary  trees  of  size  n. 


Thus,  the  length  function  lk(n)  —  (logK)n.  +  logu  satisfies 
the  Kraft  inequality  with  equality.  This  function  is  best  possi¬ 
ble  in  a  sense  that  the  coefficient  of  the  linear  term  cannot  be 
made  smaller  than  log  k  so  far  as  we  want  to  have  separable 
codes  for  k-ary  trees. 
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Abstract  —  A  multilevel  arithmetic  coding  algo¬ 
rithm  is  proposed  to  encode  data  sequences  with  large 
or  unbounded  source  alphabets.  The  algorithm  is  uni¬ 
versal  in  the  sense  that  it  can  achieve  asymptotically 
the  entropy  rate  of  any  independently  and  identically 
distributed  integer  source  with  a  finite  or  infinite  al¬ 
phabet,  as  long  as  the  mean  value  is  finite. 

I.  Introduction 

In  many  data  compression  systems,  one  often  has  to  ef¬ 
ficiently  compress  integer  sequences.  For  example,  in  run- 
length  coding,  one  has  to  efficiently  encode  a  sequence  of  runs 
of  0's  and  l's,  which  is  transformed  from  the  original  binary 
sequence;  in  grammar-based  coding[4],  one  has  to  efficiently 
compress  a  sequence  of  integers  with  potentially  unbounded 
number  of  distinct  integers. 

When  the  size  of  the  alphabet  from  which  data  sequences 
are  drawn  is  large  enough,  however,  the  problem  of  universal 
compression  of  these  data  sequences  is  not  as  simple  as  it 
may  look  like.  Due  to  the  well-known  underflow  and  overflow 
problems,  finite  precision  implementations  of  the  traditional 
adaptive  arithmetic  coding[2]  cannot  work  if  the  size  of  the 
source  alphabet  exceeds  a  certain  limit.  On  the  other  hand, 
although  some  existing  coding  schemes  such  as  the  Golomb 
codes,  Elias  codesfl],  and  their  variants  can  process  integer 
sequences  with  infinite  alphabets,  they  are  not  universal  in 
the  sense  that,  for  most  memoryless  sources,  their  compression 
rates  sue  strictly  above  the  entropy  rates  of  these  sources. 

In  this  study,  we  propose  a  new  practical  coding  method, 
called  multilevel  arithmetic  coding,  to  encode  data  sequences 
with  large  or  even  unbounded  alphabets.  For  any  data  se¬ 
quence  X  =  x\xi  ••■x„  to  be  compressed,  let  Sx  denote  the 
set  that  consists  of  sill  the  distinct  symbols  appearing  in  X. 
In  general,  els  X  gets  longer  and  longer,  Sx  may  grow  without 
bound.  This  new  method  converts  the  dynamically  changing 
set  Sx  into  a  dynamic  tree,  whose  leaves  represent  small  sub¬ 
sets  of  Sx  and,  together,  form  a  ptwtition  of  Sx-  For  each 
symbol  Xi  in  the  sequence  X ,  let  y;  denote  the  path  in  the 
tree  from  the  root  to  the  leaf  containing  the  symbol  x,\  Let  z; 
denote  the  index  of  Xi  in  the  corresponding  leaf  sub-alphabet. 
The  sequence  X  is  then  fully  represented  by  the  sequences 
Y  —  yi  y2  ■  -  ■  yn  and  Z  =  zizz  ■  •  •  zn.  FVom  information  the¬ 
ory,  we  have 

H(X)=H(Y,Z)  =  H(Y)  +  H(Z\Y),  (1) 

where  H(X),  H{Y,  Z),  and  H(Y)  sue  the  empirical  entropy  of 
the  input  sequence  X ,  the  path  Eind  index  sequence  (Y,  Z),  and 

1  This  work  was  supported  in  part  by  the  Natural  Sci¬ 
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the  path  sequence  Y ,  respectively,  and  where  H(Z\Y)  is  the 
empirical  conditional  entropy  of  the  index  sequence  given  the 
path  sequence.  The  above  equation  implies  that  to  encode  X, 
one  may  instead  encode  Y  first  and  then  conditionally  encode 
Z  given  Y.  This  forms  the  information  theoretical  basis  for 
the  proposed  multilevel  euithmetic  coding  algorithm. 

II.  Algorithm  Description  and  Optimality 
Result 

Consider  the  genertil  CEise  that  the  alphabet  may  increase 
without  bound,  and  the  decoder  does  not  know  how  it  grows. 
To  encode  such  a  data  sequence  X  =  XiXi  •  •  •  xn,  we  com¬ 
bine  Elias  codingfl]  with  a  dynamically  updated  binary  search 
tree.  The  proposed  algorithm  works  as  follows:  For  each  sym¬ 
bol  Xi  in  the  input  sequence,  if  it  has  not  appeared  before  in 
xi  •  •  •  xi- 1 ,  use  the  Elias  code  to  encode  x;  and  then  add  this 
symbol  to  the  corresponding  leaf  sub- alphabet  and  update  the 
tree  structure;  if  Xi  has  appeared  before,  then  encode  the  cor¬ 
responding  path  in  the  dynamic  tree  and  the  index  in  the  cor¬ 
responding  leaf  sub-alphabet.  For  the  details  about  how  the 
dynamic  tree  is  updated,  and  other  details  of  the  algorithm, 
plesise  see  the  full  paper[3].  Here  we  just  give  the  following 
theorem  without  proof. 

Theorem  1  For  any  i.i.d.  integer  source  {xi}?!^  with  finite 
mean,  the  proposed  algorithm  can  achieve  asymptotically  the 
entropy  rate  of  the  source. 

III.  Conclusion 

The  advEintages  of  the  proposed  algorithm  over  the  traditional 
adaptive  arithmetic  coding  algorithm  are  two  folds:  (1)  the 
proposed  Edgorithm  cEin  be  used  to  encode  tiny  data  sequence 
no  matter  whether  the  corresponding  source  alphabet  is  finite 
or  infinite,  while  the  traditionEiI  adaptive  arithmetic  coding 
algorithm  CEtn  work  only  for  data  sequences  with  bounded, 
small  alphabets;  (2)  in  the  situation  in  which  the  traditional 
adaptive  Eirithmetic  coding  Eilgorithm  can  work,  the  proposed 
algorithm  can  reduce  coding  complexity  and  improve  com¬ 
pression  performEince. 
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Abstract  —  We  describe  a  mechanical  model  for  rep¬ 
resenting  discrete  distributions  and  show  that  it  leads 
to  an  efficient  test  for  the  possibility  of  key  agreement 
unconditionally  secure  against  active  adversaries. 

I.  Motivation 

Assume  that  two  parties  Alice  and  Bob  have  access  to 
independent  realizations  of  the  random  variables  X  and  Y, 
respectively,  and  that  an  adversary  Eve  knows  Z.  Let  Pxyz 
be  the  joint  distribution  of  the  three  random  variables.  Can 
Alice  generate  a  string  M  such  that  Bob  is  convinced  that 
M  comes  from  Alice  and  not  from  Eve?  Clearly,  the  answer 
to  this  question  depends  on  Pxyz ,  more  precisely,  on  the 
following  property  of  Pxyz- 

Definition  1.  Let  X,  Y,  and  Z  be  random  variables. 
Then  X  is  simulatable  by  Z  with  respect  to  Y,  denoted  by 
simy(Z  -4  A),  if  there  exists  a  conditional  distribution  Px\z 
such  that  Pxy  =  Pxy  holds,  where  Pxy  =  £-fVz  •  Px\z- 

It  is  not  surprising  that  Eve  can  impersonate  Alice  towards 
Bob  if  and  only  if  sirny-  ( Z  — ¥  X)  holds.  In  case  of  non- 
simulat  ability,  the  string  M  can  be  a  sufficiently  long  block 
of  independent  realizations  of  X. 

Another,  closely  related,  application  of  the  simulatability 
condition  is  the  following.  The  AUZ-scenario  was  consid¬ 
ered  with  respect  to  the  question  whether  Alice  and  Bob 
can,  by  communication  over  an  insecure  channel,  generate 
a  secret  key  S  about  which  the  adversary  has  virtually  no 
information.  As  the  important  quantities  in  this  context, 
the  secret-key  rate  S(X-,Y\\Z),  with  respect  to  passive  ad¬ 
versaries,  and  the  robust  secret-key  rate  S*(A;  Y\\Z),  secure 
against  active  adversaries  with  complete  control  over  the  com¬ 
munication  channel,  were  defined  [1].  It  was  shown  that  ei¬ 
ther  S*{X-,Y\\Z)  =  S(X-,Y\\Z)  or  S*(X-,Y\\Z)  =  0  holds, 
and  that  the  simulatability  condition  separates  the  two  cases: 
If  neither  sim y(Z  — t  X)  fior  simx  {Z  -4  Y)  holds,  then  secret- 
key  agreement  secure  against  active  adversaries  is  possible  at 
the  same  rate  as  against  passive  wire-tappers,  but  completely 
impossible  otherwise. 

Unfortunately,  the  simulatability  condition  is  a  priori  not 
very  helpful  since  it  is  not  clear  how  it  can  be  verified  in  finite 
time,  let  alone  efficiently.  It  is  the  goal  of  this  note  to  present 
a  new  intuitive  formalism  based  on  a  mechanical  model,  and 
to  show  that  this  leads  to  efficient  criteria  for  simulatability. 

II.  A  Mechanical  Model  for  Discrete 
Distributions  and  Channels 

Let  us  consider  the  following  representation  of  joint  distri¬ 
butions  of  discrete  random  variables  U  and  V.  For  simplicity, 
we  assume  that  V  is  binary,  i.e.,  V  =  {uo,vi}.  Then  the 
constellation  Mu<-v  is  defined  by  the  list  of  pairs  Mv<-v  := 
(Pu(u),  Pv\u=u(vo))u€U-  The  pairs  of  such  a  constellation 

1  Department  of  Computer  Science,  ETH  Zurich,  CH-8092 
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M  =  can  be  represented  as  mass  points  in  the 

interval  [0, 1],  where  rm  determines  the  mass  of  a  point,  and 
a;  is  its  position.  (This  representation  is  one-dimensional  be¬ 
cause  V  is  binary.) 

rrij  1^2  m3  HI4 

»  1  • — * — ■  •  ■ — * — — • 
a,  (=0)  aj  aj  a4  (=1) 

Definition  2.  Let  M  =  be  a  constellation  with 

X  mi  =  1.  The  center  of  gravity  of  M  is  defined  as  X  m,at. 

We  say  that  a  constellation  M'  —  ( m'i,  a()i=1 p  is  derived 

from  M  by  mass  splitting  if  it  arises  from  M  by  replacing  a 
pair  ( m;,a; )  by  two  pairs  (pm,,  a;)  and  ((1  -  p)mi,at)  for 
some  0  <  p  <  1.  Furthermore,  M'  is  derived  from  M  by 
mass  union  if  two  pairs  (mi,  a i)  and  ( mj,aj )  are  replaced 
by  the  single  pair  (m,  +  mj,(miai  +  mjaj)/(mi  +  mj)), 
corresponding  to  the  sum  mass  in  the  center  of  gravity  of  the 
two  masses.  We  call  mass  splitting  and  mass  union  basic  mass 
operations.  Neither  of  them  changes  the  center  of  gravity. 
A  constellation  M  is  called  stronger  than  M',  denoted  by 
M  M' ,  if  there  exists  a  finite  sequence  of  basic  operations 
that  transforms  M  into  M' . 

Note  first  that  simy(Z  -4  X)  is  equivalent  to 
Mz<~y  Mx<-y  ■  The  reason  is  that  a  channel  Px\Z 
can  be  translated  into  a  sequence  of  basic  mass  operations 
in  the  mechanical  model,  and  vice  versa.  However,  this 
does  not  directly  lead  to  an  efficiently  verifiable  criterion  for 
simulatability.  It  is  only  a  reformulation  of  the  condition.  We 
now  define  a  property  of  a  pair  of  mass  constellations  which 
is  efficiently  checkable  and  equivalent  to  one  constellation 
being  stronger  than  the  other. 

Definition  3.  For  a  mass  constellation  M  and  for  0  <  t  <  1, 
we  denote  by  £t(M)  the  leftmost  masses  of  M  of  total  amount 
t.  A  constellation  M'  is  called  more  centered  than  M, 
denoted  by  M'  -<  M,  if  for  all  t,  c(£t(M'j)  >  c(£t(M))  holds, 
where  c(S)  stands  for  the  center  of  gravity  of  a  set  S  of  masses. 

Given  two  mass  constellations  M  and  M',  this  condition 
can  be  checked  in  linear  time.  Indeed,  note  that  M’  -<  M  is 
equivalent  to  the  fact  that  for  every  1  <  k  <  l' ,  the  center  of 
the  set  of  masses  m\, . . . ,  m’k  is  not  left  of  (i.e.,  smaller  than) 
the  center  of 

Theorem  1.  Let  Pxyz  be  the  joint  distribution  of  ran¬ 
dom  variables  X,  Y ,  and  Z ,  where  Y  is  binary.  Then 
sim y(Z  -4  A)  is  equivalent  to  Mx*-y  -<  Mz*-y  ■ 

If  Y  is  iV-ary,  the  distribution  can  be  represented  in  an 
(N—  l)-dimensional  space.  However,  the  straight-forward  gen¬ 
eralization  of  the  above  condition  is  not  always  sufficient.  It  is 
an  open  problem  to  find  an  efficient  test  for  the  general  case. 
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Abstract  —  In  the  original  definitions  of  information- 
theoretic  secret- key  agreement,  the  required  secrecy 
condition  was  too  weak.  We  show,  by  a  generic  reduc¬ 
tion,  that  it  can  be  strengthened  without  any  effect 
on  the  achievable  key- generation  rate. 

I.  Models  of  Information-Theoretic 
Secret-Key  Agreement 

Motivated  by  Wyner’s  wire-tap  channel  [7],  different  set¬ 
tings  for  information-theoretic  secret-key  agreement  have  been 
proposed  by  Csiszar  and  Korner  [3]  and  Maurer  [5].  Whereas 
in  the  model  of  [3],  Alice  is  connected  to  Bob  and  Eve  by  a 
noisy  broadcast  channel  characterized  by  Pyz\x  (Alice  sends 
X  and  Bob  and  Eve  receive  Y  and  Z,  respectively),  only  corre¬ 
lated  information,  but  not  insecure  communication  is  regarded 
as  a  resource  in  the  model  of  [5].  Here,  the  parties  Alice  and 
Bob  are  connected  by  a  noiseless  and  authentic  but  otherwise 
insecure  channel  and  have  access  to  random  variables  X  and 
Y,  respectively,  whereas  the  adversary  knows  Z. 

In  both  settings,  the  capability  of  generating  a  secret  key 
has  been  defined  asymptotically  as  the  maximal  achievable 
key-generation  rate  (i.e. ,  the  number  of  resulting  key  bits  per 
channel  use  or  per  realization  of  the  triple  XYZ,  respectively) 
such  that  the  adversary  obtains  information  at  an  arbitrar¬ 
ily  small  rate  only.  The  corresponding  quantities  were  called 
the  secrecy  capacity  Cs(Py z\x)  [3]  and  the  secret-key  rate 
S(X-,Y\\Z)  [5],  respectively.  However,  the  secrecy  condition 
which  only  limits  the  rate  at  which  Eve  obtains  information 
about  the  key  does  not  imply  that  the  adversary’s  informa¬ 
tion  is  bounded  in  an  absolute  sense,  let  alone  negligibly  small. 
This  is  clearly  unsatisfactory  and  motivated  the  definition  of 
strong  variants  of  secrecy  capacity  Cs{Pyz\x)  [2]  and  secret- 
key  rate  S(X-,Y\\Z )  [4],  requiring  that  the  adversary’s  infor¬ 
mation  about  the  resulting  key  is  small  in  total. 

In  [4],  a  lower  bound  on  S(X;Y\\Z)  was  shown,  whereas 
in  [2],  a  result  similar  to  Corollary  2  below  was  proved 
(with  techniques  different  from  ours).  In  this  note  we  de¬ 
scribe  a  generic  method  for  strengthening  the  security  of  any 
information-theoretic  key  agreement  by  using  only  a  negligible 
amount  of  extra  communication  from  Alice  to  Bob  and  such 
that  the  effective  key-generation  rate  is  asymptotically  equal 
to  the  rate  with  respect  to  the  weak  definition. 

II.  A  General  Method  for  Strengthening  the 
Security 

Definition  1.  Let  e  >  0  be  a  real  number  and  let  A  be  a 
positive  integer.  A  weak  key  agreement  with  parameters  e  and 
N  ( KA(e,N )  for  short)  between  two  parties  Alice  and  Bob 
and  with  respect  to  an  adversary  Eve  outputs  three  random 
variables  Sa,  Sb,  and  U,  known  to  Alice,  Bob,  and  Eve,  re¬ 
spectively,  such  that  Prob[S.4  Sb]  <  e,  H(Sa )  >.(1  —  e)N, 
and  I(Sa\  U)  <  eN  hold. 

Such  key  agreement  is  called  strong,  denoted  by  KA(e,N), 
if  the  random  variables  Sa,  Sb,  and  U  satisfy  the  following 
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more  restrictive  conditions.  There  must  exist  a  string  S  with 
Prob  [S  =  SA  =  SB)  >  1  -  e,  H{S)  =  log  |S|  >  (1  -  e)N,  and 
/(S;  U)  <  e. 

Theorem  1.  Assume  that  a  noiseless  channel  from  Alice  to 
Bob  is  given  to  which  Eve  has  perfect  read  access.  Then  weak 
key  agreement  can  be  converted  into  strong  key  agreement 
such  that  the  key  is  generated  asymptotically  at  the  same 
rate  and  the  amount  of  required  extra  communication  is 
asymptotically  vanishing.  More  precisely,  for  every  e  >  0 
there  exists  a  >  0  such  that  for  all  sufficiently  large  M 
and  for  all  sufficiently  large  N,  KA(e,  N)  can  be  reduced  to 
K  —  (1  +  o(l))N/M  realizations  of  KA(a,  M)  such  that  the 
length  len(C)  of  the  message  C  sent  over  the  insecure  channel 
by  Alice  is  of  order  len(C)  =  o(N). 

The  proof  idea  is  as  follows.  First,  weak  key  agreement 
is  repeated  many  times.  Then,  error  correction  information  is 
sent  from  Alice  to  Bob  (and  hence  to  Eve),  allowing  Bob  to  re¬ 
construct  Alice’s  sequence  of  weak  keys  with  high  probability. 
Finally,  this  string  is  transformed  into  a  highly  secret  key  by 
privacy  amplification.  Universal  hashing,  as  proposed  in  [1],  is 
not  a  good  choice  for  hashing  the  string  in  this  situation  since 
the  required  amount  of  communication,  i.e.,  the  specification 
of  a  particular  function  from  the  universal  class,  would  be  too 
high  (thus  reducing  the  achievable  key-generation  rate  in  the 
broadcast-channel  model).  As  a  new  method  in  this  context, 
we  use  extractors  [6]  instead.  This  allows  for  keeping  the  extra 
communication  negligible. 

Theorem  1  directly  implies  that  in  both  models  described 
above,  the  secrecy  requirements  can  be  strengthened  without 
effect  on  the  achievable  key-generation  rates. 

Corollary  2.  Cs(Pyz\x)  —  Cs{Pyz\x)  • 

Corollary  3.  S(X-,Y\\Z)  =  S{X-Y\\Z)  . 
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Theorem  1  Assume  the  attacker  knows  Q  and  the  decoder 
knows  Q  and  Q.  For  any  attack  subject  to  distortion  Z?2,  a 
rate  R  is  achievable  iff  R  <  C ,  where 


1 


C 


max  min  J(Q,Q), 

Q{x,u\x,h)€QQ(y\x)€.Q- 


(1) 


(U,X,K) 


Y  is  a  Markov  chain,  and 


J{Q,Q)  =  I(U;Y\K)-I(U-,X\K). 

If  K  =  X  (host  data  available  at  the  decoder),  the  solution 
becomes  a  saddlepoint  of  I{X\  Y\X)  [1], 
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Abstract  —  Information  hiding  is  analyzed  as  a  com¬ 
munication  game  between  an  information  hider  and 
an  attacker,  in  which  side  information  is  available  to 
the  information  hider  and  to  the  decoder.  Capacity 
formulas  are  derived. 

I.  Statement  of  the  Problem 

Information  hiding  is  an  emerging  research  area  which  en¬ 
compasses  applications  such  as  watermarking,  fingerprinting, 
and  steganography.  This  paper  extends  results  from  [1];  see 
[2]  for  more  details. 

Consider  a  host-data  source  producing  random  variables 
X  taking  values  in  a  finite  alphabet  A,  a  cryptographic-key 
source  producing  random  variables  K  e  X,  and  a  message 
source  producing  a  message  M  from  a  message  set  M.  The 
host  data  is  a  sequence  XN  =  (Ai ,  •  •  • ,  Xjv).  A  cryptographic 
key  Kn  =  (K\,  ■  ■  ■ ,  Kn)  is  available  both  at  the  encoder  and 
the  decoder.  In  particular,  KN  enables  the  use  of  random¬ 
ized  codes.  The  pairs  ( Xi,Ki )  are  i.i.d.  p(x,k).  This  model 
includes  K  =  X  as  a  special  case  [1].  The  message  M  is  uni¬ 
formly  distributed  over  the  message  set  A4.  The  information 
hider  passes  XN ,  KN ,  and  the  message  m  through  an  embed¬ 
ding  function  /n,  producing  composite  data  XN  that  are  made 
publicly  available1.  Next,  the  attacker  passes  XN  through  a 
random  attack  channel  QN  (yN \xN)  to  produce  corrupted  data 
Yn  ,  in  an  attempt  to  remove  traces  of  M. 

Both  the  embedding  and  the  attack  are  subject  to  dis¬ 
tortion  constraints,  respectively  EdN (xN ,  Jn(xn ,m,kN))  < 
D\  and  EdN(xN,yN)  <  D2,  where  dN(xN,yN)  = 

Ylk=i^x k'Vk)  ’s  a  distortion  function  on  iV-tuples.  Here 
d  :  X  x  X  — >  1R+  is  a  bounded,  nonnegative  function. 

The  rate  of  the  code  is  R  —  jj  log  \M\.  The  average  proba¬ 
bility  of  error  is  Pc,n  =  pep  Dm  P(<1>n{Yn  ,  KN)  /  m|M  = 
m),  where  4>n  is  the  decoding  function.  A  rate  R  is  achiev¬ 
able  for  distortions  (£>1,  Df),  if  there  is  a  sequence  of  codes 
(M,  fn,  4>n)  subject  to  distortion  D\,  with  rates  Rn  >  R  such 
that  Pe,N  — ►  Oas  N  — >  00,  for  any  attack  subject  to  distor¬ 
tion  D2.  The  information-hiding  capacity  C(Di ,  £>2)  is  the 
supremum  of  all  achievable  rates  for  distortions  (D\ ,  D2). 

II.  Main  Result 

Consider  first  memoryless  attack  channels.  Define  a  covert 
channel  Q(x,  u\x,  k)  (to  be  designed  by  the  information  hider), 
where  u  £  U  is  an  auxiliary  random  variable,  U  is  an  arbitrary 
finite  alphabet,  and  Dx  j  k  u  d{x,x)Q(x,u\x,  k)p(x,  k)  <  D\. 
Denote  by  Q  and  Q  the  sets  of  admissible  covert  and  attack 
channels,  subject  to  respective  distortion  constraints  (£>i,  Df). 

The  proof  of  Theorem  1  below  relies  on  a  proof  of  achiev- 
ability  and  a  converse  for  a  fixed  attack  channel  and  is  closely 
related  to  work  by  Gel’fand  and  Pinsker  [3]. 
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^ XN  is  often  referred  to  as  the  watermarked  signal. 


III.  Continuous  Alphabets 

The  results  above  can  be  extended  to  the  case  of  infinite 
alphabets  X,U,tC.  The  case  of  Gaussian  X  (~  Af( 0,  ex2))  and 
squared-error  distortion  measure  d(x,y)  =  (x  —  y)2  is  of  con¬ 
siderable  interest.  When  K  =  X,  the  hiding  capacity  is  given 
by  C  =  \  log  (l  +  if  P>2  <  cr2  +  Di,  and  0  otherwise. 

Here  /?  =  ^1  —  -yTrfj  j  .  The  optimal  covert  channel  Q  is 

given  by  X  —  X  +  Z,  where  Z  ~  Af( 0,  D\)  is  independent  of 
X.  The  optimal  attack  is  the  Gaussian  test  channel  from  R/D 
theory,  with  distortion  level  min(D2,cr2  +  D\). 

For  blind  information  hiding  (no  key),  the  optimal  attack 
Q{y\x)  is  again  the  Gaussian  test  channel,  and  the  optimal 
Q(a:,u|i)  is  the  same  distribution  that  achieves  capacity  in 
a  problem  studied  by  Costa  [4].  The  capacity  is  the  same 
whether  or  not  the  host  data  are  known  at  the  decoder. 

If  X  is  non-Gaussian  with  mean  zero  and  variance  <r2,  C 
above  is  an  upper  bound  on  hiding  capacity.  For  small  D\  and 
£>2  (typical  of  many  information-hiding  problems),  a  remark¬ 
able  result  arises:  the  hiding  capacity  under  the  squared-error 
distortion  metric  is  equal  to  |  log  (l  +  independently  of 
the  statistics  of  X ,  asymptotically  as  D\,D2  — »  0. 

IV.  Further  Extensions 

The  results  above  have  been  extended  to  the  case  of  block- 
wise  i.i.d.  ( Xi,Ki )  and  blockwise  i.i.d.  attacks.  If  (X,,Ki) 
are  i.i.d.,  then  the  optimal  attack  is  memoryless.  The  frame¬ 
work  developed  in  this  paper  can  also  be  used  to  analyze  the 
performance  of  a  variety  of  information-hiding  systems  [2]. 
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We  consider  a  cryptographic  scenario  of  two  honest  parties  A  and  B  facing  an  active 
eavesdropper  E.  They  share  no  secrete  key  initially  but  their  final  goal  is  to  generate  a 
shared  information-theoretically  secure  key.  We  develop  the  special  case  represented  in 
[1]  where  a  random  binary  string  is  broadcasted  by  some  center  (like  a  satellite)  over 
binary  symmetric  channels  and  received  as  X,  Y,  Z-strings  with  bit  error  probabilities 
sA,  sb,  se  (£a<£e,  £b<£e)  by  legal  parties  and  the  intruder,  respectively.  The 
authentication  protocol  is  a  procedure  to  append  some  bit  positions  of  X taken  in  line  by 
certain  rule  to  every  message  being  transmitted  from  A  to  B.  This  rule  can  be  chosen  as 
some  binary  block  code  that  compares  messages  and  code  words  one-to-one.  Party  B 
accepts  the  message  as  original  if  and  only  if  the  fraction  of  bits  in  the  received 
authenticator  that  agree  with  the  corresponding  bits  of  his  string  Y  exceeds  some  fixed 
threshold.  Otherwise  B  rejects  the  message  considering  it  to  be  false.  It  was  remarked  in 
[1]  that  the  distance  property  of  a  code  used  for  such  authentication  differs  from  the 
Hamming  distance  and  it  should  be  changed  to  semidistance. 

A  simple  construction  of  constant  weight  authentication  codes  based  on  linear 
binary  codes  which  provide  fixed  minimum  code  semidistance  was  given  in  [1]. 

Using  this  construction  we  derive  the  formulas  to  estimate  the  probabiity  that  a 
modification  of  the  message  by  an  intruder  is  not  detected  by  party  B  and  the 
probability  that  B  accepts  the  message  if  an  intruder  has  not  intervened  at  all.  We 
propose  several  methods  how  to  design  authentication  codes  based  on  the  use  of 
nonlinear  codes  that  can  be  more  effective  in  some  cases. 

Unfortunately,  the  use  of  any  authentication  code  as  a  part  of  key  sharing  procedure 
turns  out  to  be  inefficient  because  it  requires  so  long  authenticators  that  results  in  a  very 
small  key  rate.  The  way  out  of  this  situation  is  to  consider  the  so  called  hybrid 
authentication  that  based  both  on  a  code  authentication  and  on  a  hashing  in  the  Almost 
Strong  Universal 2  class  .We  proof  several  statements  and  derive  the  formulas  to 
estimate  its  efficiency. 
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Abstract  —  A  point-to-point  communication  net¬ 
work  is  represented  by  (G,  C),  where  G  =  (V, £)  is 
a  directed  graph  with  vertex  set  V  and  edge  set  £, 
and  C  =  [Cij ,  (i,  j)  G  £]  is  a  nonnegative- valued  vector. 
A  vertex  in  V  represents  a  node  in  the  communica¬ 
tion  network,  and  an  edge  ( i ,  j )  represents  a  point-to- 
point  discrete  memoryless  channel  (DMC)  from  node 
i  to  node  j  whose  capacity  is  Cij.  We  assume  that 
the  channels  in  the  network  are  independent  of  each 
other.  An  information  source  with  entropy  rate  h  is 
generated  at  source  node  s  and  recovered  at  sink  node 
t  with  arbitrarily  small  probability  of  error.  We  show 
that  the  value  of  a  max-flow  from  node  s  to  node  t  in 
(G,  C)  must  be  greater  than  or  equal  to  h.  This  results 
implies  a  separation  theorem  for  network  coding  and 
channel  coding  in  such  a  communication  network. 

I.  Introduction 

A  point-to-point  communication  network  can  be  repre¬ 
sented  by  (G,  C),  where  G  =  (V,  £)  is  a  directed  graph  with 
vertex  set  V  and  edge  set  £,  and  C  =  [Cij ,  (»,  j)  G  £ ]  is  a 
nonnegative-valued  vector.  A  vertex  in  V  represents  a  node 
in  the  communication  network,  and  an  edge  (t,  j)  G  £  rep¬ 
resents  a  point-to-point  discrete  memoryless  channel  (DMC) 
from  node  i  to  node  j  whose  capacity  is  Cij .  All  the  channels 
in  the  network  are  independent  of  each  other.  We  assume  that 
there  Eire  a  source  node  s  and  a  sink  node  t  in  G  such  that 
the  information  source  is  generated  at  node  s  and  recovered 
at  node  t.  In  the  network,  there  is  a  dedicated  encoder  Eij 
at  node  i  (i  ^  t)  for  each  output  channel  (i,j)  G  £.  Each 
encoder  Eij  receives  all  the  information  sent  to  node  i  via  the 
channels  (i',i)  G  £.  At  the  sink  node  t,  there  is  a  decoder 
which  recovers  the  information  source. 

A  code  on  a  network  of  point-to-point  channels  can  be  very 
complicated  in  general,  especially  if  the  network  is  cyclic.  In 
[1],  we  define  a  realizable  code  which  covers  almost  all  possible 
codes  on  a  network.  A  triple  (G,C,h)  is  admissible  if  there 
exists  a  realizable  code  on  network  (G,  C)  such  that  informa¬ 
tion  can  be  transmitted  at  rate  h  from  node  s  to  node  t  with 
arbitrarily  small  probability  of  error.  Define  the  capacity  of  a 
network  (G,  C)  as  the  supremum  of  all  h  such  that  (G,  C,h ) 
is  admissible. 

II.  Main  results 

Suppose  there  exists  a  realizable  code  on  G  such  that  an 
information  source  with  entropy  rate  h  generated  at  node  s 
can  be  recovered  at  node  t  with  arbitrarily  small  probability 
of  error.  A  cut  in  G  represents  a  collection  of  channels  which 
separates  node  s  and  node  t.  A  channel  across  a  cut  is  called 
a  forward  channel  if  its  direction  is  from  node  s  to  node  t, 
otherwise  it  is  called  a  reverse  channel.  If  there  is  no  reverse 
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channel  across  the  cut,  the  information  source,  the  inputs  of 
the  channels  across  the  cut,  the  outputs  of  the  channels  across 
the  cut,  and  the  reproduction  of  the  information  source  by  the 
decoder  at  node  t  form  a  Markov  chain  in  this  order.  By  the 
data  processing  theorem,  the  capacity  of  the  cut  (i.e.,  the  total 
capacity  of  forward  channels  across  the  cut)  must  be  greater 
than  or  equal  to  h. 

However,  a  cut  may  contain  reverse  channels,  even  if  G  is 
acyclic.  In  this  case,  the  Markov  chain  to  which  we  applied 
the  data  processing  theorem  above  does  not  always  hold.  The 
main  result  in  [1]  is  that  the  capacity  of  any  cut  must  be 
greater  than  or  equal  to  h.  The  following  theorem  resembles 
the  Max-flow  Min-cut  theorem  [2]  in  network  flow  theory. 

Theorem  1  If  (G,  C,/i)  is  admissible,  then  the  value  of  a 
max-flow  from  the  source  to  the  sink  is  greater  than  or  equal 
to  h. 

Ahlswede  et  al  [3]  studied  the  problem  in  which  for  all  edges 
(i,j)  G  £ ,  information  can  be  sent  from  node  i  to  node  j 
noiselessly,  i.e. ,  Gy  =  oo.  This  is  the  network  coding  problem 
associated  with  the  problem  we  study  in  this  work,  except 
that  they  consider  multicasting  the  information  source  from 
the  source  node  to  possibly  more  than  one  sink  node  in  the 
network.  Let  iiy  be  the  coding  rate  of  encoder  Ey  for  (i,j)  G 
E,  and  let  R  =  [J2y ,  (i,  j )  G  £\.  They  proved  that  it  is  possible 
to  multicast  information  at  rate  h  from  the  source  node  to  each 
sink  node  if  and  only  if  the  value  of  a  max-flow  from  the  source 
node  to  each  sink  node  in  (G,  R)  is  greater  than  or  equal  to 
h.  From  this  result  and  Theorem  1,  we  can  determine  the 
capacity  of  a  network. 

Theorem  2  The  capacity  of  a  network  (G,  C)  is  equal  to  the 
value  of  a  max- flow  from  node  s  to  node  t. 

It  also  follows  from  this  theorem  that  in  our  problem,  asymp¬ 
totic  optimality  can  always  be  achieved  by  separating  network 
coding  and  channel  coding.  Generalization  of  our  problem  to 
multicasting  the  information  source  from  the  source  node  to 
a  number  of  sink  nodes  is  straightforward. 
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Abstract  —  We  introduce  the  real,  discrete-time 
Gaussian  parallel  relay  network.  This  simple  net¬ 
work  is  theoretically  important  in  the  context  of  net¬ 
work  information  theory.  We  present  upper  and  lower 
bounds  to  capacity  and  explain  where  they  coincide. 

I.  Introduction 

In  some  contexts,  cooperation  between  terminals  in  a  multi¬ 
ple  terminal  system  can  enlarge  the  set  of  reliably  achievable 
rates.  For  systems  where  power  is  of  primary  importance,  such 
as  in  wireless  or  ad  hoc  networks,  terminals  can  cooperate  by 
sending  signals  with  a  common  component.  This  common 
component  coherently  combines  at  a  receiver,  resulting  in  an 
increased  effective  power.  Exploiting  this  requires  common 
information  at  distributed  points  and  synchronization  of  the 
carriers  in  a  wireless  system.  Investigating  how  this  can  be 
accomplished  is  important  for  improving  both  real-world  sys¬ 
tems  and  theoretical  understanding  of  networks. 

To  this  end,  we  assume  that  carrier  synchronization  is  fea¬ 
sible  and  introduce  the  real,  discrete-time  Gaussian  parallel 
relay  network,  illustrated  in  Figure  1.  We  wish  to  find  the 
capacity  of  the  network  when  the  only  source  of  extrinsic  in¬ 
formation  is  encoded  into  the  signal  X.  The  sole  purpose  of 
the  relays  is  to  get  the  information  from  X  to  a  decoder  observ¬ 
ing  Y.  We  assume  the  noise  processes  are  independent  and 
are  white  with  variances  Ni,  N2,  and  N.  Further,  we  assume 
the  network  input  and  relays  have  average  power  constraints 
Px,  Pi,  and  P2.  The  network  is  thus  parametrized  by  four 
signal  to  noise  ratios  (SNR’s):  Si  =  j^-,  S2  =  j^,  S3  =  jj-, 
and  S4  =  This  network  is  similar  to  the  relay  channel 
introduced  in  [1]  and  studied  in  [2].  It  differs  via  Relay  2, 
which  provides  an  important  separation  between  the  source 
and  destination. 

II.  Upper  bounds  to  capacity 
Due  to  the  presence  of  the  relays,  it  is  not  surprising  that 
tight  upper  bounds  to  network  capacity  are  difficult  to  deter¬ 
mine.  The  first  upper  bound  is  a  result  of  the  data  processing 
theorem  applied  to  the  broadcast  side  of  the  network. 

R  <  — 7(X”;  Yi",  Y2n)  <  \  ln(l  +  Si  +  S2).  (1) 

n  i 

The  second  upper  bound  is  more  involved  and  can  be  de¬ 
rived  almost  exactly  as  in  [2]  for  the  physically  degraded  Gaus¬ 
sian  relay  channel. 

R<  max  min[i  In  ((1  +  Si)(l  +  (1  —  a)S4)) , 

o6[0,l]  2 

iln(l  +  S3  +  S4  +  2v/5S^)]  (2) 

A  similar  bound  holds  with  S2  in  place  of  Si  and  the  roles  of 
S3  and  S4  reversed.  These  bounds  are  in  general  tighter  than 
the  data  processing  bound  applied  to  the  multiple  access  side. 
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Figure  1:  The  Gaussian  parallel  relay  network 


III.  Achievability  results 
We  first  present  results  for  the  symmetric  case  Si  =  S2  and 
S3  =  S4.  We  do  this  for  two  reasons.  First,  we  reduce  the 
parametrization  from  four  SNR’s  to  two,  thus  making  presen¬ 
tation  easier.  Second,  we  highlight  two  fundamentally  differ¬ 
ent  approaches  to  communicating  through  this  network. 

We  first  consider  a  natural  staggered  block  coding  scheme. 
Both  relays  decode  a  block  of  observations  and  then  trans¬ 
mit  identical  corresponding  codewords  (with  high  probabil¬ 
ity).  The  relays  achieve  perfect  cooperation  in  this  case,  but 
the  scheme  is  limited  since  each  relay  must  decode  reliably. 
This  scheme  results  in  reliably  achievable  rates  up  to 

R=  iln(l  +  min[Si,4S3]).  (3) 


When  Si  >  4S3,  (3)  and  (2)  coincide,  determining  capacity. 

The  second  approach  views  the  signals  Yi  and  Y2  as  in¬ 
dependent  observations  of  the  input  X.  Each  relay  acts  as  a 
simple  transponder,  amplifying  both  signal  and  noise.  If  X  is 
Gaussian,  this  combines  the  observations  optimally  (and  the 
core  signal  component  X  coherently)  before  the  multiaccess 
receiver  noise  Z  is  added.  We  can  achieve  rates  up  to 


R  = 


_4SiS3_\ 

I  +  2S3  +  S1/  ' 


(4) 


As  the  multiaccess  noise  power  N  becomes  relatively  small, 
i.e.,  as  ( i+2sS3a+Si )  (4)  and  (1)  coincide,  and  network 

capacity  is  |  ln(l  +  2Si). 

Combining  these  approaches  simultaneously  is  inferior  to 
using  the  better  of  the  two  schemes.  However,  time-sharing 
between  schemes  at  different  values  of  Si  and  S3  is  beneficial. 
We  present  these  results  for  a  typical  symmetric  network.  For 
an  asymmetric  network,  coding  schemes  can  be  based  on  more 
general  broadcast  and  multiple  access  approaches.  We  present 
a  number  of  these  generalizations. 


IV.  Conclusion 

Intuition  and  study  of  the  symmetric  network  suggest  that  the 
converses  we  have  derived  are  not  tight  in  general. 
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Abstract  —  In  this  paper  we  present  a  different  view 
on  the  broadcast  channel  that  fits  better  an  asyn¬ 
chronous  setting,  where  each  receiver  can  “listen”  to 
the  broadcasted  data  at  different  time  intervals.  In 
this  scenario,  there  is  a  “static”  fixed  amount  of  data, 
that  needs  to  be  transmitted  to  all  receivers.  Each  re¬ 
ceiver  wants  to  minimize  the  receiving  time,  and  this 
way  get  the  static  data  at  the  highest  possible  rate. 

I.  Concept  Definition 

In  this  work  we  define  and  analyze  static  broadcasting.  In 
this  broadcasting  scenario,  the  sender  has  only  a  fixed  common 
information  to  transmit  to  all  receivers.  We  suggest  the  fol¬ 
lowing  definition  of  the  rate  -  the  number  of  reliably  received 
bits  divided  by  the  number  of  symbols  the  receiver  has  used  to 
retrieve  these  bits  (or,  divided  by  the  information  gathering 
time).  Under  this  definition,  in  principle,  a  receiver  that  listen 
through  a  better  channel,  may  gather  less  channel  symbols  in 
order  to  estimate  the  transmitted  message,  and  by  this  to  in¬ 
crease  its  rate.  In  the  saved  time  it  can  fetch  more  information 
from  other  transmitters.  The  term  static  broadcasting  comes 
from  the  notion  that  the  information  the  transmitter  sends  is 
fixed,  static,  and  the  same  for  all  receivers. 

In  this  work,  a  broadcast  channel  is  composed  of  single 
transmitter  and  d  memoryless  channels  Wi,  1  <  i  <  d,  with 
common  input  alphabet  through  which  the  transmitter  broad¬ 
casts  to  d  receivers.  The  capacity  region  is  defined  as  the 
closure  of  the  set  of  all  possible  achievable  rates.  A  rate 
(Ri,  R2,  •  •  • ,  R-d)  is  said  to  be  achievable  if  for  any  e  >  0  there 
exists  a  code  with  M  words  such  that  for  all  i,  the  ith  re¬ 
ceiver  can  decode,  with  error  probability  smaller  than  e,  the 
codeword  using  the  first  [log  M/Ri\  channel  symbols.  The 
achievable  rate  region  is  given  by  the  following  theorem. 

Theorem  1  (Rj,  R2,  • .  ■ ,  Rd)  is  in  the  capacity  region  iff,  for 
any  8  >  0  there  exist  input  priors  Pi,  P2, .  ■  ■  and  a  number  K 
such  that  A-  J(Pt;  Wi)  >  Ri  —  8  for  all  1  <  i  <  d,  where 


In  defining  the  capacity  region  for  static  broadcasting  we 
utilized  the  possibility  of  transmitting  the  information  at  a 
higher  rate  if  the  receivers  are  not  forced  to  be  synchronously 
and  simultaneously  connected  to  the  transmitter.  The  fact 
that  there  are  various  possible  definitions  of  the  capacity  for 
the  broadcast  channel,  depending  on  the  subset  of  time  the 
data  is  received,  has  been  pointed  out  in,  e.g.,  [1].  However, 
the  setting  we  propose  is  novel. 

The  proposed  setting  was  further  extended  in  [2].  For  ex¬ 
ample,  in  [2]  there  is  a  setting  where  the  receivers  start  receiv¬ 
ing  at  different  arbitrary  times,  which  may  fit  an  IP.  Multicast 
scenario.  Another  extension  corresponds  to  data  transmission 
over  an  unknown  channel,  using  infinitely  long  codes  (to  allow 
a  channel  with  unbounded  small  capacity).  Finally,  universal 
and  sequential  decoding  schemes  were  investigated. 


II.  Examples  of  the  Capacity  Region 

A  general  method  to  find  the  capacity  region  for  static¬ 
broadcasting  to  2  channels,  is  as  follows.  Assume  the  channels 
conditional  probabilities  are  W\{y\x),  W2{y\x)  and  the  corre¬ 
sponding  capacities  are  Ci ,  C2-  Using  the  convexity  of  the  mu¬ 
tual  information,  we  may  assume  that  the  input  prior  to  the 
channel  is  changed  only  at  time  points  of  the  form  t  —  n,  + 1. 
Hence,  in  the  case  of  2  receivers,  we  start  with  prior  P  and  af¬ 
ter  one  of  the  receivers  got  all  the  information,  it  will  quit,  and 
in  order  to  maximize  the  rate  to  the  second  receiver,  we  shall 
change  the  input  prior  to  the  one  that  achieves  its  capacity. 
Assuming  I{P-,Wi)  >  I{P;W2).  Then, 

I(P-Wi)C2 


R1=I(P-,W 1),  R2 


C2  +  I{P\  W\)  —  I(P;  W2) 


Of  course,  any  point  r  1  <  R\,r2  <  R2  is  in  the  capacity 
region.  To  get  the  complete  capacity  region  we  should  take 
the  union  of  the  region  above  over  all  possible  values  of  the 
initial  input  prior,  P . 

In  case  where  the  capacity  achieving  prior  is  the  same  for 
all  channels  we  can  achieve  simultaneously  their  point-to-point 
capacity.  For  example,  two  binary  symmetric  channels  noisy 
and  noiseless.  In  that  case  a  simple  code  can  be  shown.  Take 
any  good  systematic  code  for  the  noisy  channel.  The  system¬ 
atic  part  (the  information  bits  are  the  prefix  of  each  codeword) 
is  sufficient,  of  course,  for  the  noiseless  channel  and  impose  an 
effective  rate  of  1  for  that  channel.  The  noisy  channel  receives 
the  information  at  a  rate  determined  by  the  code. 

In  static  broadcasting,  unlike  regular  broadcasting,  time 
sharing  between  two  strategies  is  not  a  valid  strategy.  Hence, 
the  capacity  region  is  not  necessarily  convex.  For  example, 
suppose  one  communicates  using  31  Japanese  words  and  31 
Hebrew  words.  A  Japanese  listener  can  differentiate  32  dif¬ 
ferent  symbols  (since  all  Hebrew  words  sound  the  same  to 
him)  and  the  same  goes  to  a  Hebrew  listener.  This  broadcast 
channel  leads  to  the  capacity  region  in  the  figure  below. 
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I.  Introduction 

We  consider  the  problem  of  broadcasting  a  bandlimited  white 
Gaussian  source  on  an  additive  bandlimited  white  Gaussian 
noise  channel  with  two  receivers.  Several  hybrid  digital-analog 
joint  source-channel  codes  are  proposed.  The  design  principle 
is  based  on  bandwidth/power  splitting  and  matched  tandem 
coding.  The  distortion  regions  of  these  codes  are  presented. 

II.  Problem  Statement 

Consider  a  memoryless  Gaussian  source,  with  zero 

mean  and  variance  a2.  The  source  is  to  be  encoded  and  trans¬ 
mitted  over  a  broadcast  AWGN  channel  modeled  by  Zk  = 
Y  +  14,  where  Y  is  the  channel  input,  Zk  and  14  are  chan¬ 
nel  output  and  noise  for  the  fe-th  user,  k  =  1,2.  We  assume 
that  Zk,Y,  and  14  are  all  m-dimensional,  E[||Y||2]  <  mP, 
the  components  of  V4  are  i.i.d.  with  zero  mean  and  variance 
Nk,  k  =  1,2,  and  0  <  IVj  <  N2. 

An  n-dimensional  encoder,  an,  is  a  mapping  of  an  n- 
dimensional  source  vector  X  to  an  m-dimensional  channel  in¬ 
put  vector  y.  Here,  p  =  mjn  is  the  bandwidth  expansion 
factor  (or  the  rate  of  the  system  in  number  of  channel  uses 
per  source  sample).  We  assume  that  p  is  fixed  while  m  and 
n  grow  large.  The  decoder,  /3n,k,  for  user  k  is  a  mapping  of 
an  m-dimensional  vector  Zk  to  an  n-dimensiona!  vector  Xk- 
Let  D(Nk)  =  D(an,  pn,k,  Nk)  be  the  mean-square  distortion 
between  „Y  and  Xk-  Shannon’s  capacity-rate-distortion  limit 
dictates  that 

fc  =  1’2'  (1) 

We  are  interested  in  the  set  of  all  possible  pair 
(D(N\),D(N2)). 

III.  Achievable  Distortion  Region 

A  pair  (di,d2)  is  an  achievable  distortion  point  if  there 
exists  an  encoder  sequence  {an}  and  decoder  sequences 
{dn,x,  fin, 2}  such  that  an  satisfies  the  power  constraint  and 
lim„-*oo  D(an,/3n,k,  Nk)  =  dk  for  k  =  1,2.  The  achiev¬ 
able.  distortion  region  is  the  collection  of  achievable  distortion 
points  [1]. 


IV.  Main  Results 

Several  hybrid-digital  analog  joint  source-channel  coding  sys¬ 
tems  are  proposed.  Details  of  these  systems  can  be  found  in 
[2].  Fig.  1  shows  the  encoder  for  one  of  these  systems  (Hy¬ 
brid  3).  This  is  valid  for  p  >  1  (bandwidth  expansion).  For 
p  <  1  (bandwidth  compression),  a  dual  of  this  system  can 
be  used  [2],  In  Fig.  1,  the  “Linear  Encoder”  corresponds  to 
the  analog  part  of  the  system.  The  performance  of  Hybrid 
3  is  shown  in  Fig.  2.  Here,  p  =  2,  101og10  P/Nj_  =  20  dB, 
and  10  log10  P/N2  —  0  dB.  Points  A  and  C  correspond  to 
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p  >  1  (Hybrid  3). 


Achievable  Region 


Figure  2:  Distortion  Performances  of  Hybrid  Digital- 
Analog  Systems. 


traditional  digital  coding  systems  optimized  for  noise  N\  and 
N2,  respectively.  The  “Time-Sharing”  dash-dot  curve  is  the 
time-sharing  between  these  two  systems  (in  linear  scale,  it  is  a 
straight  line  between  A  and  C).  The  “Digital”  dash-dot  curve 
is  a  purely  digital  system  presented  in  [3].  Fig.  2  shows  that 
Hybrid  3  is  superior  to  both  “Digital”  and  “Time-Sharing” 
systems.  The  result  shows  that  the  region  above  and  to  the 
right  of  the  heavy  solid  line  is  achievable. 

References 

[1]  Z.  Reznic,  R.  Zamir  and  M.  Feder,  “Joint  source-channel  cod¬ 
ing  of  a  Gaussian  mixture  source  over  the  Gaussian  broadcast 
channel,”  Thirty-Sixth,  Allerton  Conference  on  Communica¬ 
tion,  Control,  and  Computing,  Oct.  1998. 

[2]  U.  Mittal,  “Broadcasting,  robustness  and  duality  in  a  joint 
source-channel  coding  system,”  Ph.D.  Dissertation,  Depart¬ 
ment  of  Electrical  and  Computer  Engineering,  SUNY  at  Stony 
Brook,  1999. 

[3]  M.  Trott,  “Unequal  error  protection  codes:  Theory  and  prac¬ 
tice,”  IEEE  Information  Theory  Workshop,  Haifa,  Israel, 
pp.  11,  June  1996, 


0-7803-5857-0/00/$!  0.00  ©2000  IEEE. 


24 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Limits  of  Information,  Markov  Chains,  and  Projection 

Andrew  R,  Barron 
Yale  University,  Statistics  Dept. 

Box  208290,  New  Haven,  CT  06520 
e-mail :  Andrew .  BarronCy ale .  edu 


I 


Abstract  —  The  chain  rule  of  information  shows  that 
log  densities  form  Cauchy  sequences,  convergent  in 
L\ ,  proving  information  limits,  Markov  chain  conver¬ 
gence,  and  existence  of  information  projections. 

Let  D(P||Q)  =  Ep  log  p(X)/q(X),  A  =  E\  Iogp(A)/p(A)|, 
and  V  =  §  \p  —  q\  be  the  information  divergence,  absolute 
information  divergence,  and  total  variation  distance  between 
probability  measures  P  and  Q  with  density  functions  p,  q  with 
respect  to  a  dominating  measure  on  a  measurable  space.  The 
chain  rule  and  the  Pinsker-type  inequality  A  <  D  +  \/2f), 
deduced  from  V  <  \/2D  (which  implies  that  if  D  tends  to 
zero  then  so  does  V  and  A)  allow  one  to  deduce  in  various 
settings  that  log  densities  provide  Cauchy  sequences  conver¬ 
gent  in  L\ ,  thereby  establishing  information  limits  including 
Markov  chain  convergence  and  information  projections. 

I.  Markov  chains 

Let  {An}  be  Markov  with  stationary  transition  probability  on 
a  general  state  space  and  let  Pn  be  the  distribution  of  A„ . 
Theorem  1.  Markov  Chain  Convergence.  If  {A„}  is  a  re¬ 
versible  Markov  chain  with  a  unique  invariant  probability  dis¬ 
tribution  P* ,  then  lim  D(Pn\\P*)  =  0  if  and  only  if  the  se¬ 
quence  D(Pn\\P*)  is  eventually  finite. 

Proof:  Let  Dn  —  D{Pn\\P*)-  The  chain  rule  gives  Dm  —  Dn , 
for  n  >  m,  as  a  divergence  (between  conditional  distribu¬ 
tions  for  Xm  given  Xn),  establishing  monotonicity  and  con¬ 
vergence  of  Dn,  so  that  Dm  —  Dn  -+  0  as  n,  m  -4  oo,  and 
thus  via  the  Pinsker-type  inequality  E\  logp  m  (Xm)/p*(Xm)- 
log  pn(Xn)/p'  {Xn)\  -4  0,  so  that  log Pn(A„)/p*(A„)  is  a 
Cauchy  sequence,  convergent  in  L\ .  Fritz  [4]  used  information 
inequalities  for  reversible  chains  to  show  the  total  variation 
convergence  of  P„  to  P* ,  so  that  p* (A„)/p„(An)  converges 
to  1  in  probability.  Thus  log pn(Xn )/p* (Xn ) ,  which  we  have 
shown  to  be  convergent  in  L\,  must  have  L\  limit  equal  to  0. 

II.  Information  Limits 

Let  Tn  be  a  monotone  sequence  of  sigma-fields  with  limit 
Too •  Let  Pn  and  Qn  denote  the  restrictions  of  P  and  Q  to 
Tn,  let  p7l  be  the  density  of  Pn  with  respect  to  Qn,  and  let 
D„  =  D(Pn\\Qn)  for  71  =  1, 2, ... ,  oo. 

Theorem  2.  Information  Limit.  If  Tn  is  decreasing  or  if  Tn 
is  increasing  and  D(Pn\\Qn)  is  bounded,  then  logpn  -4  logpoo 
in  Li(P)  and  limn  D(Pn\\Qn)  =  L>(Poo||<3oo). 

Proof:  In  the  case  that  Tn  is  decreeing,  for  n  >  m  we 
have  Dm—Dn  —  J  Pm  log pm/PndQ  establishing  monotonic¬ 
ity,  convergence,  and,  hence,  the  Cauchy  sequence  property, 
so  that,  via  the  Pinsker-type  inequalities,  both  J  |pm—  pn\dQ 
and  i?|  log  pm  —  logp„|  tend  to  0  as  n,m  4  oo.  Hence  pn  is 
convergent  in  Li(Q)  (denote  the  limit  poo)  and  log pn  is  con¬ 
vergent  in  L\{P)  with  limit  logpoo.  Sets  A  in  Too  are  in  Tn 
for  all  n  with  P(A)  =  fA  pndQ,  so  by  Li  (Q)  convergence, 
P(A)  =  JAPoodQ ,  that  is,  the  limit  poo  is  indeed  the  density 
between  the  restrictions  of  P  and  Q  to  Too ■  For  the  increas¬ 
ing  case  one  proceeds  in  the  same  manner  using  the  chain  rule 


to  extract  Cauchy  convergence  of  pn  in  Li(Q)  and  log  pn  in 
L\  (P)  and  to  identify  the  limit. 

Theorem  2  implies  Theorem  1  using  the  decreasing  Tn  gen¬ 
erated  by  {An,  Xn+i,  ■  ■  .}•  The  conclusion  for  the  limit  of 
increasing  information  is  classical,  see  [1]  and  the  references 
cited  therein.  Our  analysis  shows  the  convergence  directly 
from  the  chain  rule,  without  appeal  to  a  martingale  conver¬ 
gence  theorem.  The  results  for  the  limit  of  decreasing  infor¬ 
mation  and  the  information  limit  of  Markov  chains  are  new. 

III.  Information  Projection 

Demonstrating  existence  of  information  projections  for  con¬ 
vex  sets  of  distributions  uses  similar  techniques.  Let  D{C\\p) 
and  D(p\\C)  denote  the  infimum  of  D(q\\p)  and  of  D(p||p), 
respectively,  over  choices  of  q  in  a  convex  set  C.  The  set  C 
might  not  admit  a  minimizer  and  one  seeks  a  limit  q *  obtained 
by  sequences  of  qn  approaching  the  infimum.  Topsoe  [7],  see 
also  [3],  resolves  the  D(C\\p)  case.  Here  we  state  a  result  for 
the  D(p\\C)  case  developed  further  in  the  Thesis  of  Li  [6]. 
Theorem  3.  Information  Projection.  Let  C  be  convex  and 
D(p\\C)  finite.  There  exists  a  unique  q*  (possibly  outside  of 
C )  such  that  every  sequence  qn  with  D(p\\qn)  -4  D(p\\C)  has 
log  qn  -4  logp*  in  Li(p).  Thus  £>(p||p*)  =  D{p\\C).  For 
all  q  in  C,  cq  —  Epq{X)/q*{X)  <  1  and,  defining  the  den¬ 
sity  r  =  ( pq/q*)/cg ,  we  have  the  Pythagorean-like  inequality 
£>(p||p)  >  D{p\\q*)  +  D(p||r),  where  via  the  Pinsker-type  in¬ 
equality  D(p\\r)  controls  the  L\  ( P )  distance  between  log  q  and 
log  p*.  Furthermore,  if  J  q  —  1  for  all  q  in  C,  then  f  q*  <  1. 

Previously,  Bell  and  Cover  [2]  show  characterizing  proper¬ 
ties  if  q *  is  in  C.  Kieffer  [5]  shows  if  {log  q  :  q  £  C]  is  closed 
in  Li(P),  then  there  exists  q“  satisfying  the  key  properties. 

The  proof  identifies  a  sequence  qn  in  C  such  that  D(p|  j</„)  { 
D(p\\C)  and  cm,n  =  Eqm(X)/qn{X)  <  1  for  all  n  >  m.  With 
rm,n  —  ( pqm/qn)/cm,n ,  one  finds  Dm  -  Dn  equals  D{p\\rm,n) 
+  log  l/cm,n,  so  by  the  Cauchy  sequence  property,  log  l/cm,„, 
D(p||rm,n)  and  hence  E\  log</m(A)/logq„(A)|  converge  to  0 
as  n,  m  — oo.  Thus  log  q„  is  a  Cauchy  sequence  with  limit 
denoted  logp*  in  L\  ( p ).  Further  details  are  in  [6]. 
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Abstract  —  We  show  that  the  BIC  estimator  of 
the  order  of  a  Markov  chain  (with  finite  alphabet) 
gives  the  correct  order,  eventually  almost  surely  as 
the  sample  size  goes  to  oo,  thereby  strengthening  ear¬ 
lier  consistency  results  that  assumed  an  apriori  bound 
on  the  order.  A  key  tool  is  a  strong  typicality  re¬ 
sult  for  Markov  sample  paths.  We  also  show  that  the 
Bayesian  or  MDL  estimator,  of  which  the  BIC  estima¬ 
tor  is  regarded  as  an  approximation,  fails  to  be  con¬ 
sistent  for  the  uniformly  distrubuted  i.i.d.  process. 

I.  Main  results 

Given  a  set  of  cardinatlity  |A|  <  oo,  denote  by  Mk  the  class  of 
those  probability  mesures  on  A 1X1  which  are  Markov  of  order 
at  most  k ,  with  stationary  transition  probabilities.  Set  M  = 
UfcL 0Mk  where  A4o  is  the  i.i.d.  class. 

One  popular  approach  to  model  selection  is  the  so-called 
Bayesian  Information  Criterion  (BIC).  It  suggests  to  estimate 
the  Markov  order  by 

kBic(xi)  =  argmin  ( —log  max  P{x ")  +  ^  — —  logn^) 

k  \  PSMk  2  J 

(1) 

if  the  observed  sample  is  2"  =  (xi, . . . ,  xn). 

Our  principal  result  is 

Theorem  1  For  any  stationary  ergodic  Q  6  M,  ks ic(x") 
equals  ko  =  min{/c:  Q  6  Mk},  eventually  almost  surely. 

The  hard  part  of  the  proof  is  to  rule  out  “moderate  over¬ 
estimation”  fcBic(zi)  €  (A;*,  a  log  n),  for  suitable  k *  >  ko  and 
a  >  0.  A  key  tool  to  this  is 

Theorem  2  Given  a  stationary  ergodic  Q  g  M,  and  0  <  /?  < 
1/2,  there  exists  a  >  0  such  that  eventually  almost  surely,  the 
k-block  types  o/x",  defined  by 

Ha\  |  *?)  =  n_l  +  1\{i  6  [0 ,n  — fc]:xj+f  =  a*}|,  at  G  A* 

satisfy  for  all  k  <  alogn 

|P(at  |  x?)  -  Q(at)|  <  n~0Q{ak1),  a\  €  A* .  (2) 

Theorem  2  permits  us  to  restrict  attention  to  “typical  se¬ 
quences”  satisfying  (2);  for  these,  the  ijumber  of  possible  k- 
block  types  does  not  grow  too  fast  as  n  -t  00,  and  the  method 
of  types  leads  to  suitable  probability  bounds. 

We  also  consider  the  Bayesian  order  estimator 

kKT{xi)  =  argmin{-logpfc  -  logKTfc(x”)}  (3) 

k 
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which  is  also  a  minimum  description  length  (MDL)  estimator, 
see  [1],  Here  pk  is  a  prior  probability  assigned  to  the  class  Mk, 
and  KT*,  is  the  Krichevsky-Trofimov  distribution  of  order  k, 
a  Dirichlet  mixture  of  measures  in  Mk-  The  expression  mini¬ 
mized  in  (1)  is  a  good  approximation  to  —  logKT*(x")  when 
k  is  fixed,  but  substantially  overestimates  the  latter  when  k 
grows  with  n,  a  fact  we  use  in  the  proof  of  Theorem  1  to  rule 
out  “gross  overestimation”  kmc{x”)  >  alogn. 

Theorem  3  The  estimator  (3)  is  not  consistent  for  the 
i.i.d.  process  with  uniform  distribution  on  A,  if  pk  de¬ 
creases  subexponentially  as  k  -t  00.  Rather,  in  this  case 
kKT(xi)  — 1  00  almost  surely. 

The  proof  depends  on  the  fact  that  for  large  k  it  is  likely 
that  no  fc-block  appears  more  than  once  in  x?,  and  then 
KT*(x?)  =  |A|-». 

II.  Discussion 

The  key  feature  of  our  consistency  result  Theorem  1  is  that 
the  minimization  for  k  in  eq.  (1)  is  unrestricted.  When  a  prior 
bound  k *  on  the  true  order  is  known,  and  the  minimization  is 
retricted  to  k  <  k* ,  consistency  has  been  proved  by  Finesso 
[2].  Kieffer  [4]  proved  consistency  without  such  restriction,  for 
a  modified  estimator  with  a  larger  penalty  term;  he  also  raised 
the  question  whether  the  BIC  estimator  (1)  was  consistent. 

Theorem  2  appears  to  be  the  first  strong  typicality  result 
for  non-i.i.d.  processes  that  admits  block  size  growth  of  order - 
log  n;  see,  however,  Flajolet  et  al.  [3]  for  coin-tossing. 

Bayesian  inconsistency  phenomena  similar  to  Theorem  3 
are  well-known  in  Statistics  though  in  less  natural  settings 
than  ours.  Theorem  3  gives  a  natural  example  when  in  the 
theorem  about  MDL  consistency  for  almost  every  choice  of 
the  parameter,  see  [1],  “almost”  is  non- vacuous.  The  contrast 
of  Theorems  1  and  3  suggests  a  deficiency  in  the  usual  inter¬ 
pretation  of  the  BIC  estimator  as  an  approximation  to  the 
Bayesian  or  MDL  estimator. 

We  note  that  the  (non-Bayesian)  “normalized  maximum 
likelihood”  version  of  MDL,  see  [1],  is  also  inconsistent  for  the 
uniformly  distributed  i.i.d.  process. 
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Abstract  —  Consider  a  pair  of  random  variables 
( X ,  Y)  with  distribution  P.  The  probability  rank  func¬ 
tion  is  defined  so  that  G(x\y)  =  1  for  the  most  probable 
outcome  x  conditional  on  Y  =  y,  G(x\y)  =  2  for  the  sec¬ 
ond  most  probable  outcome,  and  so  on,  resolving  ties 
between  elements  with  equal  probabilities  arbitrarily. 
The  function  G  was  considered  in  [1]  in  the  context 
of  finding  the  unknown  outcome  of  a  random  experi¬ 
ment  by  asking  questions  of  the  form  ‘Is  the  outcome 
equal  to  x?’  sequentially  until  the  actual  outcome  is 
determined.  The  primary  focus  in  [1],  and  the  sub¬ 
sequent  works  [2],  [3],  was  to  find  tight  bounds  on 
the  moments  E[G(A|y)®].  The  present  work  is  closely 
related  to  these  works  but  focuses  more  directly  on 
the  large  deviations  properties  of  the  probability  rank 
function. 

I.  Results 

The  aim  of  this  work  is  to  determine  the  large  deviation  ex¬ 
ponent  of  In  G, 

lim  n_1lnP[lnG(An|yn)  >  nL],  (1) 

n— >oo 

for  a  sequence  of  pairs  of  r.v.’s  (Xn,Yn)  under  various  as¬ 
sumptions  regarding  their  distribution.  Special  instances  of 
this  problem  correspond  to  finding  the  error  exponent  in 
source  and  channel  coding  problems  of  information  theory. 
E.g.,  if  we  regard  Xn  as  an  input  of  length  n  to  a  noisy  chan¬ 
nel  and  Yn  as  the  channel  output,  P[ln  G(Xn|Yn)  >  nL]  is 
the  probability  of  decoding  error  for  a  list  decoder  with  list 
size  enL .  We  begin  by  noting  that  the  mean  of  In  G  is  closely 
related  to  the  Shannon  entropy. 

Proposition  1  For  (X,  Y)  a  pair  of  jointly  distributed  ran¬ 
dom  variables, 

-  ln(l  +  In  M)  +  H(X\Y)  <  E[ln  G(X|y)]  <  H(X\Y)  (2) 

where  M  is  the  maximum  over  all  y  of  the  range  of  X  condi¬ 
tioned  on  Y  =  y. 

We  study  large-deviations  of  lnG(Xn|yn)  under  the  assump¬ 
tion  that  the  sequence  of  functions 

¥>»(*)  =  -  In  E[G(An|yn)fl]  (3) 

n 

converges  to  a  limit  <p{0).  We  let  Rv>  denote  the  range  of  ip'. 
Now,  the  Gartner-Ellis  theorem  [4,  p.15]  gives 

Proposition  2  For  any  L  6  R^i , 

Urn  n_i  lnP[lnG(Aniyn)  >  nL]  =  <p{0L)  -  0Lp'{6L)  (4) 

n-+oo 

where  0l  =  inf{0  :  <p'(9)  =  L}. 


For  the  special  case  where  (Xn,  Yn)  is  a  pair  of  random  vectors 
with  i.i.d  components,  we  recall  from  [1]  that  for  any  6  >  0 

1 1+9 

lim  ipn (0)  =  <p{6)  =  In  V  V  P{x,  j/)1/(1+#)  (5) 

n— >oo  ' 

V  l  x 

This  yields  the  source  coding  error  exponent  (with  side  infor¬ 
mation  yn).  The  well-known  source  coding  error  exponent  [5, 
p.37]  is  obtained  by  omitting  the  side  information  term. 

Another  special  case  of  interest  is  when  Xn  represents  a 
codeword  from  a  block  code  with  block  length  n  and  rate  R. 
Then,  P(xn)  =  e~nR  if  xn  is  a  codeword  and  0  otherwise. 
This  distribution  is  called  the  code’s  empirical  distribution 
and  denoted  Qn  below.  The  r.v.  Yn  represents  the  channel 
output  when  Xn  is  transmitted.  We  recall  from  [1]  that  for 
6  >  0, 

pn{0)-OR-n~lEo{e,Qn)+o{n)  (6) 

where  Eq  is  is  Gallager’s  function  [6,  p.  138]  and  o(n)  is  a 
quantity  that  goes  to  zero  as  n  goes  to  infinity.  Proposition 
2  now  yields  the  well-known  sphere-packing  bound  for  list¬ 
decoding. 

In  the  case  of  L  =  0,  which  corresponds  to  ordinary  ML 
decoding,  Proposition  2  may  not  apply  since  0  may  not  belong 
in  Ry .  In  this  case,  Gartner-Ellis  theorem  yields  only  a  lower- 
bound. 

Proposition  3  Let  {(Xn,yn)}  be  a  sequence  of  input- output 
pairs  for  a  noisy  channel  such  that  {pn}  converges  to  a  limit 
<p.  Then, 

liminf  n_1  lnP[lnG(An|yn)  >  0]  >  -<W+(0o)  (7) 

n— »  oo 

where  6o  =  inf{0  :  i p(9)  >  0}  and  <p'+  denotes  right-derivative. 

It  can  be  shown  that  this  bound  is  equivalent  to  the  familiar 
sphere-packing  lower  bound  [6,  p.  157],  except  it  is  formulated 
in  terms  of  code  empirical  distributions. 

References 

[1]  E.  Arikan,  “An  inequality  on  guessing  and  its  application  to 
sequential  decoding,”  IEEE  Trans.  Inform.  Theory,  vol.  IT-42, 
no.  1,  pp.  99-105,  January  1996. 

[2]  E.  Arikan  and  N.  Merhav,  “Guessing  subject  to  distortion,” 
IEEE  1 bans.  Inform.  Theory,  vol.  IT-44,  no.  3,  pp.  1041-1056, 
May  1998. 

[3]  E.  Arikan  and  N.  Merhav,  “Joint  source-channel  coding  and 
guessing  with  application  to  sequential  decoding,”  IEEE  Trans. 
Inform.  Theory,  vol.  IT-44,  no.  5,  pp.  1756-1769,  September 
1998. 

[4]  J.  A.  Bucklew,  Large  Deviation  Techniques  in  Decision,  Simu¬ 
lation,  and  Estimation.  New  York:  Wiley,  1990. 

[5]  I.  Csiszdr  and  J.  Korner,  Information  Theory:  Coding  Theorems 
for  Discrete  Memoryless  Systems.  New  York:  Academic,  1981. 

[6]  R.  G.  Gallager,  Information  Theory  and  Reliable  Communica¬ 
tion.  New  York:  Wiley,  1968. 


0-7803-5857-0/00/$  1  0.00  ©2000  IEEE. 


27 


ISIT  2000,  Sorrento,  Italy.  June  25-30,2000 


Information-theoretic  methods  in  testing  the  goodness 


of  fit1 


Laszlo  Gyorfi 

Dep.  of  Computer  Science  and 
Information  Theory 
Technical  Univ.  of  Budapest 
Stoczek  u.  2 

H-1521  Budapest,  Hungary 
e-mail:  gyorf ifflinf .  bme .  hu 


G.  Morvai 

Dep.  of  Computer  Science  and 
Information  Theory 
Technical  Univ.  of  Budapest 
Stoczek  u.  2 

H-1521  Budapest,  Hungary 
e-mail:  morvaifflinf  .  bme .  hu 


Igor  Vajda 

Institute  of  Information  Theory 
and  Automation 

Acad,  of  Sciences  of  the  Czech  Rep. 
Pod  vodaxenskou  vezf 
CZ-182  08  Prague,  Czech  Rep. 
e-mail:  vajdafflutia.cas.cz 


We  present  a  new  approach  to  evaluating  the  efficiency  of 
information-divergence- type  statistics  for  testing  the  goodness 
of  fit.  Since  the  Pitman  approach  is  too  weak  to  detect  suffi¬ 
ciently  sharply  the  differences  in  efficiency  of  these  statistics, 
the  attention  is  focused  on  the  Bahadur  efficiency. 

We  consider  the  classical  statistical  model  of  goodness  of  fit 
with  independent  data  ( Xi  :  i  €  N)  where,  under  a  hypothesis 
77,  Xi  is  distributed  by  p  on  an  abstract  space  ( X ,  A)  and, 
under  an  alternative  A,  it  is  distributed  by  u  /  p.  In  addition 
to  ft  and  i/,  we  consider  the  standard  empirical  distribution 
/in  =  (d*!  +  ■■■5xn)/n  on  (X,A)  and  the  infinite  product 
distributions 


P  =  pN,  Q  =  vN  on  (XN ,  AN). 


We  also  consider  partitions  Vn  =  {A„i, . . . ,  Anmn  }  C  A 
of  X  with  mn  t  oo  and  the  discrete  stochastic  mn-vectors 
Pn  =  ( Pnj ),  Qn  =  (?nj )  and  pn  =  (PnjO  generated  by  these 
partitions  and  the  distributions  p,  v  and  pn .  We  are  interested 
in  the  statistics 

T^.n  =  D#(pn\pn) 


which  are  the  ^-divergences  of  Csiszar  for  convex  t  >  0. 
Particular  attention  is  paid  to  the  information  divergence  (ID) 
statistic  7(pn;pn)  and  the  reversed  ID  statistic  /(pn;pn),  and 
to  the  classical  Pearson  statistic  y2(pn;pn)  and  the  reversed 
Pearson  (Neyman)  statistic  y2(pn;pn). 

Our  results  are  formulated  for  nonatomic  p  and  u,  under 
relatively  milde  restrictions  on  the  partitions  Vn.  These  re¬ 
strictions  are  fulfilled  e.  g.  when  pnj  =  1  /mn,  the  likelihood 
ratios  Pnj/qnj  are  bounded,  and  the  partitions  are  nested  in 
the  sense  Vi  C  Vi  C  . . .  and  generate  the  cr-algebra  A.  More¬ 
over,  we  consider  restrictions  on  mn  of  the  type 

lim  =  0  (1) 

n— >oo  71 

for  nondecreasing  sequences  c„  >  0. 

Our  main  result  is  the  formula 


Bih/fo) 


9<t> i  (yi  /*))  >  s<p2,™ 

9<i>  2  (&<i>  2  iy\  m))  n-Aoc  tn 


(2) 


for  the  Bahadur  relative  efficiency  of  the  test  rejecting  H  when 
T^n  >  with  respect  that  rejecting  when  T^2,n  >  C02,n- 
Here,  for  <f>  =  and  0  =  <j>2,  D${y\p)  is  the  (^-divergence  of 
distributions  v  and  p  and  g<t,{e)  for  t  >  0  is  the  exponent  in  the 
well  known  information-theoretic  formula  for  large  deviation 
of  the  “types”  pn  in  a  discrete  source  of  mn  letters  distributed 


by  pn,  where 


9<p(£) 


lim 

n— >  oo 


^  inf  I(Pn;Pn) 
Pn:^,n^c  _  J 


(3) 


cf.  Problem  1.2.11  in  [2].  In  (3),  s^,n  >  0  is  an  appropriate 
norming  sequence  leading  to  finite  <?^(e). 

The  definition  (2)  exploits  the  approach  developed  in  [3] 
and  the  formula  (3)  has  been  first  proposed  in  [1].  Obviously, 
(2)  is  applicable  only  when  the  limits  in  (2)  and  (3)  exist,  but 
(2)  also  assumes  that 


lim  EpT4,tn=  0  and  lim  T^,„  =  D^(v\p)  Q-a.s.  (4) 

7i  — \  oo  n— >oo 


and  that  mn  satisfies  (1)  with  cn  =  s^,n  Inn. 

We  have  proved  that  (4)  follows  from  (1)  with  cn  =  Inn 
for  equal  to  I{pn;pn)  and  y2(pn;  pn).  On  the  other  hand, 
the  first  of  the  conditions  (4)  cannot  hold  for  T<^n  equal  to 
I (pn ; Pn )  and  X2(Pn',Pn)-  We  found  that  for  their  robusti- 
fied  versions  /(p„;a„pn  +  (1  -  Qn)pJ  and  X2(pn;c*nPn  + 
(1  —  Qn)pn)  with  q„  |  0  both  conditions  (4)  hold  and  the 
original  nonrobustified  function  94(e)  obtained  from  (3)  re¬ 
mains  valid.  The  sequences  S0,n  and  the  functions  g^s)  for 
the  above  mentioned  statistics  are  presented  in  the  Table,  to¬ 
gether  with  B{(f>  1  /<^2 ) ’s  for  the  statistic  T^2  in  the  line  and  T<p  , 
in  the  preceeding  line.  Prom  [3]  it  is  known  that  B(<j>i/<j>2)  =  0 
for  T*,, n  =  X2(P„;P„)  and  T02  =  7(pn;pn),  i.e.  that  the  ID 
test  is  infinitely  more  Bahadur  efficient  then  the  classical  Pear¬ 
son  test.  The  remaining  results  of  the  Table  seem  to  be  new. 
They  are  negative  for  the  reversed  versions  of  the  two  formerly 
mentioned  statistics. 


S$,7l 

94>  (e) 

B(<h/4>2) 

X2(P„;C*nPn  +  (l-an)pJ 

m„ 

1 

I(Pn:anPn  +  (1  -Qn)p„) 

mn 

1 

l 

X2(PniPn) 

y/mn 
In  mn 

2 

0 

-f(PniPn) 

l 

£ 

0 
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Abstract:  In  this  paper,  we  will  present  a  construction  method  for 
obtaining  the  decomposable  codes  that  are  originated  from  two- 
dimensional  array  codes  and  of  the  form 

|  al  +  JC  |  |  am  +  JV  |  a,  -t - 1-  am  +X  +  _y|  .  Many  best  known 

codes  can  be  constructed  using  this  method 

I.  INTRODUCTION 

Codes  constructed  by  combining  shorter  or  simpler  codes  are 
decomposable  and  can  be  decoded  with  reduced  complexity.  A 
new  class  of  decomposable  codes  presented  in  this  paper  is 
created  on  the  basis  of  two-dimensional  array  codes  which 
themselves  are  decomposable.  The  construction  of  the  codes,  in  a 
form  of  |  a,  +  x  |  •  •  •  |  am  +  x  |  a,  +  •  •  ■  +  am  +  x  +  y  | ,  embraces 

many  existing  code  structures.  This  is  not  just  an  extension  of  the 
existing  code  construction,  but  also  an  opportunity  for  finding 
more  good  codes  or  constructing  the  best  known  codes  [3]  in  a 
simpler  way.  Also,  because  of  the  use  of  array  codes,  their  trellis 
structure  and  efficient  soft-decision  decoding  algorithm  will  play 
a  major  role  in  the  trellis  decoding  of  the  decomposable  codes 
created. 


m=2and  y=  0,  i.e.  in  a  form  of  |  a  +  x  \b  +  x|  a  +  b  +  x| . 

To  optimize  a  given  code,  we  need  to  fix  any  two  of  the  three 
code  parameters,  length  n,  dimension  k  and  minimum  distance 
d,  and  to  improve  the  third  one.  In  our  case,  for  example,  the 
two  component  codes  GA  and  G B  are  used  to  augment  the 
product  code  C  in  such  a  way  where  the  length  and  minimum 
distance  of  the  decomposable  code  C'  are  kept  the  same  as  code 
C,  and  the  dimension  of  the  code  C’  is  greater  than  that  of  code 
C.  This  means  that  n'=n,  d'~d  and  k’>k.  To  this  end,  we  set  up 
criteria  for  selecting  GA  and  G B ,  as  follows: 

The  conditions  for  choosing  GA  and  G B  such  that  the 
augmented  code,  C' ,  has  the  same  minimum  distance  as  C,  i.e., 
d'  —  d ,  are  set  up  for  the  following  cases: 

[ d A  >  d  /  n7 

1.  When  G A  *0,  GB=  0.  \  A  2 

[duA>d/n2 


II.  CODECONSTRUCTION 

A  simple  example  of  two-dimensional  array  codes  is  the  product 
code.  A  product  code  C  is  formed  by  a  direct  product  of  two 
component  codes  C,  =  («,,&,,</, )  and  C2  =  (n2,k2,d2 ),  so  it  is 
a  decomposable  code.  The  generator  matrix,  G,  of  C  is 
represented  in  the  form  of  a  Kronecker  product  of  generator 
matrices  of  its  component  codes,  G,  and  G2 , 

i.e.: G  =  G,  ®G2  ={g%G2)  orG  =  G2®G,  =  (g<2)G,) 

where  G,  =  (gj’j  ),G2  =  (g,(2) ) .  The  new  decomposable  code  C’ 
is  constructed  by  using  the  generator  matrix 

G,  G,N 

G  =  ’g,  G,  (1) 

Ga  .  Ga 

\  gb  , 

where  GA  and  G B  are  the  generator  matrixes  of  component 
codes  CA  =(nl,kA,dA)  and  CB  =  («,  ,kB  ,dB )  respectively. 
Code  C’  is  therefore  referred  as  a 

|  a,  +  x  |  |  am  +  x  |  a,  -I —  +  am  +  x  +  y  |  -construction  code, 

with  e  C,  e  CA  and  yeCB.  This  construction 

can  be  viewed  as  the  squaring  construction  [1]  when  m= 2,  jc=0 
andy=0,  and  the  Turyn  [2]  or  cubing  cons  truction  [1]  when 


2.  When  G A  =  0,  GB  ^  0.  dB  >  d 

dA  >  d  /  n2 

d^A  ^d/n2 
dB>d 
d^  >d/n2 

where  duA  and  duAR  are  the  minimum  distances  of  the 
union  codes  C,  vj CA  and  QuC^uQ,  respectively. 

An  efficient  search  algorithm  for  optimum  decomposable 
codes  can  be  designed  by  setting  the  dimension-improving 
target  according  to  the  table  of  the  best  known  codes  [3],  and 
letting  C,  be  as  small  as  possible.  The  use  of  small  C,  may 
require  large  number  of  component  codes,  but  reduce  the 
complexity  of  the  search  algorithm. 
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Abstract  —  We  investigate  general  properties 
of  rectangular  codes.  The  class  of  rectangular 
codes  includes  all  linear,  group,  and  many  non¬ 
group  codes.  We  define  a  basis  of  a  rectangular 
code.  This  basis  gives  a  universal  description 
of  a  rectangular  code.  The  rectangular  algebra 
is  defined.  We  show  that  all  bases  of  a  length- 
2  rectangular  code  have  the  same  cardinality. 
Bounds  on  cardinality  of  a  basis  of  a  rectangular 
code  are  given.  We  present  a  simple  procedure 
to  get  rectangular  basis  of  a  linear  code  from 
its  generator  matrix. 

A  block  code  G  is  a  set  of  words  c  =  (ci,...cn) 
of  length  n  over  an  alphabet  Q  =  {0, 1, . . . ,  q  —  1}  . 
Given  t  €  [1,  n  —  1]  ,  split  every  codeword  c  into  the 
past  p  =  (ci, ..  ,Ct)  and  the  future  f  =  (cf+i, . . . ,cn)  , 
i.e.,  c  =  pf  .  A  set  C  C  Qn  is  called  t  -rectangular 
if  the  following  implication  is  true  [1]  (in  [2]  such  a  set 
was  called  t  -separable): 

Pi/i,  P1/2,  P2/1  £  C  — ►  P2/2  £  G.  (1) 

A  set  G  C  Qn  is  called  rectangular  if  it  is  t  - 
rectangular  for  each  t . 

All  group,  linear,  and  many  famous  nonlinear  codes 
are  rectangular.  Rectangular  codes  have  the  follow¬ 
ing  nice  property.  The  minimal  trellis  of  a  rectangular 
code  is  unique,  biproper,  and  minimizes  a  number  of 
complexity  measures  including  the  Viterbi  (or  APP) 
decoding  complexity.  In  addition,  the  minimal  code 
trellis  gives  a  universal  compact  representation  of  a 
rectangular  code.  We  present  another  universal  com¬ 
pact  description  of  a  rectangular  code  using  a  suggested 
idea  of  rectangular  basis. 

Given  an  arbitrary  block  code  G  ,  a  rectangular  set 
that  includes  G  and  has  the  minimum  cardinality  is 
called  a  rectangular  closure  of  G  and  is  denoted  by 
[G]  .  A  rectangular  closure  [G]  is  unique.  We  say  that 
a  set  G  generates  a  rectangular  set  C  ( G  is  a  gen¬ 
erating  set  for  G )  if  [GJ  =  C .  A  set  G  is  called 
independent  if  for  any  g  £  G  g  ^  [G\p]  .  An  indepen¬ 
dent  set  B  generating  a  rectangular  set  C  is  called 
a  basis  of  the  rectangular  set  C  .  It  is  known  [3]  how 

xThe  work  was  supported  by  Russian  Fundamental 
Research  Foundation  (project  No  99-01-00840)  and  by 
Deutsche  Forschungs  Gemeinschaft. 


to  get  a  basis  of  a  rectangular  set  and  how  to  get  the 
rectangular  set  from  its  basis. 

1.  Rectangular  Algebra.  We  define  over  the  set 
Qn  of  words  a  ternary  partial  operation  of  rectangu¬ 
lar  complement.  The  set  of  words  with  this  operation 
is  called  rectangular  algebra.  A  rectangular  code  is  a 
rectangular  subalgebra.  This  allows  us  to  use  results 
of  algebra.  On  the  other  hand  the  rectangular  algebra 
is  an  interesting  example  of  universal  algebra. 

The  following  theorem  gives  an  upper  bound  on  car¬ 
dinality  of  the  rectangular  closure  of  the  set  G. 

Theorem  1  |[G}|  <  2,c?|-1. 

An  important  question  for  any  universal  algebra  is: 
’’Have  bases  of  a  closed  set  the  same  cardinality?”. 

Conjecture  2  All  bases  of  a  rectangular  code  have  the 
same  number  of  words. 

We  show  that  Conjecture  2  is  true  for  codes  of  length 

2. 

2.  Bounds  on  Cardinality  of  a  Basis.  From  Theo¬ 
rem  1  we  get 

Theorem  3  Cardinality  of  a  basis  B(C)  of  a  binary 
rectangular  code  C  is  bounded  by 

log2  |C|  +  1  <  |B(C)|  <  |C|. 

4.  Rectangular  basis  of  a  linear  code.  We  present 
a  simple  procedure  to  get  rectangular  basis  of  a  linear 
code  from  generator  matrix  of  the  code.  This  basis 
can  be  used  as  follows.  Assume  that  a  nonlinear  rect¬ 
angular  code  G  is  a  union  of  cosets  of  a  linear  code 
L  .  Using  the  proposed  procedure  we  obtain  a  basis 
B(L)  of  the  linear  code  L.  A  basis  of  a  coset  L  +  a 
is  B(L)  +  a  .  So,  we  can  construct  a  generating  set  for 
C  as  union  of  bases  of  the  cosets  of  L. 
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Abstract  —  We  construct  new  cocyclic  generalised 
Hadamard  matrices  using  semifield  multiplication. 
The  matrices  used  are  constructed  from  cocycles  de¬ 
fined  over  elementary  abelian  groups.  These  construc¬ 
tions  naturally  yield  generalised  Hadamard  codes 
meeting  the  Plotkin  bound. 

I.  Introduction 

Non-binary  Hadamard  codes  meeting  the  Plotkin  bound  can 
be  constructed  using  generalized  Hadamard  matrices  [1]. 
In  this  paper  we  construct  families  of  cocyclic  generalized 
Hadamard  matrix  codes  meeting  the  Plotkin  bound  from  cocy¬ 
cles  defined  from  finite  fields  GF(pm),  commutative  semifields 
such  as  Dickson  semifields  and  non-commutative  semifields  of 
order  16. 

II.  Cocycles 

Let  G  be  a  finite  group  of  order  v  and  C  be  a  finite  abelian 
group  of  order  w  where  w  divides  v  (ic|i>).  A  cocycle  is  a  map¬ 
ping  ip  :  G  x  G  ->  C,  satisfying  the  following  cocycle  equation 
ip{g,h)ip{gh,k)  =  ip(g,hk)ip(h,k) ,  for  all  p,  h,  fc  G  G.  This 
implies  ip(g,  1)  =  ip{l,h)  =  ^(1,1),  for  all  g,h  €  G.  We 
only  consider  normalized  cocycles,  for  which  ip(  1, 1)  =  1. 

A  cocycle  associated  with  the  groups  G  and  C  is  naturally 
represented  as  a  square  matrix  of  order  v  x  v,  whose  rows 
and  columns  are  indexed  by  the  elements  of  the  group  G  un¬ 
der  some  fixed  ordering  and  whose  entry  in  position  ( g,h )  is 
ip(g,h).  We  call  such  matrices  G-cocyclic  matrices.  We  repre¬ 
sent  a  G-cocyclic  matrix  as  My,  =  [ip(g,  h)]g,h£G-  If  the  cocycle 
ip  is  symmetric  then  My,  is  a  symmetric  matrix. 

Definition  1  When  u)|u,  the  cocycle  ip  :  G  x  G  — >  C  is 
orthogonal  if  the  non-ibitial  rows  of  My,  are  uniformly  dis¬ 
tributed  over  the  elements  of  C.  That  is,  for  each  g  ^  1  €  G, 
\{h  G  G  :  tp(g,h)  -  a}|  =  v/w,  for  all  a  €  G. 

III.  Generalized  Hadamard  matrices  and 

RELATED  CODES 

A  generalized  Hadamard  matrix  GH(m,  v/w)  over  a  group 
C  is  a  v  x  v  matrix  with  entries  from  the  group  G  of  order  w, 
in|u,  such  that  the  list  of  quotients  hijh/J ,  1  <  j  <  v,  contains 
each  element  of  C  exactly  v/w  times.  Let  H*  be  a  matrix  with 
entries  fit  =  h~A ,  then  the  defining  matrix  equation  over  ZG 
is 

HH*  =  vlv  +  (v/w)(%2  u)(Jv  -  /„),  (1) 

«  ec 

where  /„  and  Jv  are  the  vxv  identity  matrix  and  matrix  with 
all  entries  1,  respectively.  Generalized  Hadamard  matrices 
can  be  used  directly  to  construct  codes  meeting  the  Plotkin 
bound.  We  have  the  following  result. 

1This  work  was  supported  by  Australian  Research  Council  Large 
Grant  #A49701206 


Theorem  1  [1,  2]  Let  At  :  G  x  G  ->  G  be  an  orthogonal 
cocycle,  where  G  is  the  additive  group  o/GF(pr).  Let  My,  be 
a  G-cocyclic  matrix  of  order  pr  x  pr  over  G, 

1.  the  rows  of  My,  without  the  first  column  form  a  (pr  — 

1  ,pr,pr  —  1)  pr -ary  code  meeting  the  Plotkin  bound. 

2.  the  rows  of  the  translates  of  a- f  My, ,  a  £  G,  of  My,  form  a 
(pr  ,p2r  ,pr  —  1)  pr -ary  code  meeting  the  Plotkin  bound. 

IV.  Orthogonal  Linearized  Polynomial  (LP) 
Cocyclic  matrices  from  Semifields 

Throughout  this  section  let  G  be  an  elementary  abelian  group 
of  order  p°.  Here  we  construct  classes  of  orthogonal  co¬ 
cyclic  matrices  using  using  linearized  permutation  polynomi¬ 
als  (LPP)  over  GF(pr).  Let  L(x)  =  5Zi=o  ^ >*’’  be  a  LPP  over 
GF(pr),  then  the  linearized  permutation  cocycle  (LP  cocycle) 
is  given  by  pz,(p,h)  =  L(g)  ■  h,  where  •  represents  multipli¬ 
cation  in  a  semifield  whose  additive  group  is  G.  We  have  a 
lemma. 

Lemma  1  Let  (F, +,•)  be  a  finite  semifield  such  that  G  — 
(F,+)  S  (GF(pr),  +).  If  L(x)  =  ™  «  LPP  °f 

GF(pr),  then  the  LP  cocycle  defined  by  p,L(g,h)  =  L(g)  -h,  is 
orthogonal. 

The  above  construction  with  •  as  the  field  multiplication  in 
GF(pr)  accounts  for  all  (symmetric  and  asymmetric)  orthog¬ 
onal  cocycles  for  groups  of  order  4,8  and  9  [2]. 

The  first  order  p“  for  which  there  exist  semifields  which 
are  not  fields  is  16.  There  are  two  such  semifields,  both  non- 
commutative.  These  two  semifields  with  the  above  construc¬ 
tion  leads  to  new"  classes  of  G-cocyclic  generalised  Hadamard 
matrices  of  order  16. 

There  is  a  class  of  finite  commutative  semifields  called  the 
Dickson  semifields,  defined  when  p  is  odd  and  the  prime- 
power  r  is  even.  Let  F  be  a  two-dimensional  vector  space 
over  GF(p6),  where  p  is  odd  and  b  >  1,  so  (F, +)  =  (Zp)2\ 
Let  z  be  any  non-square  in  GF(pb).  Each  field  automor¬ 
phism  0  of  GF(pb)  defines  a  multiplication  •  on  F  to  be 
(a,6)'(c,  d)  =  (ac+zbede ,bc+ad),  which  makes  F  a  commuta¬ 
tive  semifield.  The  only  field  property  which  does  not  hold  is 
associativity  of  multiplication.  But  this  implies  that  the  rows 
of  the  matrix  M,t  for  the  field  multiplication  in  GF(p26)  can¬ 
not  be  permuted  to  give  the  rows  for  the  Dickson  semifields, 
and  the  corresponding  Hadamard  codes  are  distinct. 
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Abstract  —  A  new  class  of  DbEC-TbED  codes  over 
GF(q)  is  constructed.  For  the  cases  of  q=3,4,  the  new 
codes  are  better  than  the  Gilbert- Varshamov  bound. 

Let  r(C)  =  n  —  log^CI  be  the  redundancy  of  a  linear  code 
over  GF(q),  p(q,n,d )  be  min{r(D)  |  D  is  a  q- ary  code  of 
length  n  and  minimum  distance  d}.  It  is  well  known  that  the 
asymptotic  Hamming  bound  p(q,  n,  d)  >_  t\ogqn  holds  as  n  -r 
oo,  where  t  =  The  Gilbert- Varshamov  bound  admits 

linear  ( q ,  n,  d)  codes  which  achieve  the  bound  r(q,  n,  d)  y  (d  — 
2)log9n  asn-roo. 

In  coding  theory,  an  important  problem  is  finding  the 
sequences  of  ( q,n,d )  codes,  asymptotically  exceeding  the 
Gilbert- Varshamov  bound,  i.e., 

r(q,n,d)  -<  {d  -  2)log,n. 

It  is  well-known  that  the  single-byte  error-correcting  and 
double-byte  error-detecting  (SbEC-DbED)  codes,  i.e.,  the 
codes  with  minimum  distances  >  4,  have  been  successfully 
used  in  computer  memory  subsystems.  We  are  interested  in 
designing  some  good  DbEC  codes  and  DbEC-TbED  codes  , 
such  that  the  redundancies  are  less  than  Gilbert- Varshamov 
bound,  and  as  small  as  possible.  In  a  previous  paper  [1],  we 
constructed  a  class  of  DbEC  codes  over  GF( 2'),  which  have 
the  parameters:  n  =  qm,r  <  2m  +  ["y]  +  1,  m  =  3,4,---. 
Our  constructions  reduce  the  code  redundancy  of  [2]  by  one 
symbol. 

In  [2,  Corollary  6],  a  class  of  DbEC-TbED  codes  were  ob¬ 
tained,  which  have  the  parameters: 


(1)  if  m  is  odd,  H  =  {l,xi,  •  •  •  ,xm,  (xi  +  x2S  -f  •  •  •  + 
imr-!  +  0<5m)^~1+1,  (an  +  ar2*  +  •■•  +  x^"1”1  + 
Or*)*=it1+\(x1+X20  +  Xa0y+<+ 1,-  -,  (*3*-a+*3fc-10  + 

Xikp2)q  +?+1,  (xi+X27  +  X372  +  a;473)93+92+,+1,  •  •  •,  (X4I-3  + 

X4i-27  +  X41-172  +  X4(73)<}  +?  +,+1},  where  m  <  3k  and 
m  <  41,  and  when  i  >  m,  let  x,  =  0; 

(2)  if  m  is  even,  H  =  {l,xi,  •  •  •  ,xm,  (xi  +  x2S  +  ■  ■  •  + 

xmr-1)«*",+1,  (n  +  x26  +  •••  +  Xm6m-1)q^+1,  (xi  + 

X2p  +  X302)q2+q+1,  •••,  (x3Jt-2  +  X3k-10  +  X3k/32)q2+q+1, 
(xi  +  X27  +  X372  +  X473)<,3+,2+'7+1>  •  •  •,  (X4I-3  +  X4!-27  + 
x4(_i72  +  X4i'y3)q  +q  +9+1},  where  m  <  3k  and  m  <  4/,  and 
when  i  >  m,  let  Xi  =  0. 

Let  LS  —  F™  and  let  H  =  (fx ,  £2,  ■  ■  -)T  be  a  parity  check 
matrix,  we  have  a  code  C  over  GF(q). 

Theorem  1  The  code  C  in  Construction  I  has  the  parame¬ 
ters: 

n  =  qm,  d>  6, 

r  5-<ma+V  +  Tf  1  +  [~t1,  when  m  =  5, 7, 9,  •  ■  • , 

-  \  ^r  +  l+rfl  +  lTl,  when  m  =  6, 8, 10,  •  •  • . 

For  q  =  3  and  4,  these  codes  are  better  than  Gilbert- 
Varshamov  bound. 

Construction  II:  Consider  q  =  3.  Let  H'  be  the  sequences 
of  all  of  the  polynomials  of  degree  <  2  in  H.  It  is  clear  that 

2.5(m  +  1),  when  m  =  3, 5,7, 

2.5m  +  1,  when  m  =  4, 6, 8,  •••. 


n  =  <jl5(m  i)/6!  j  r  <  2.5m,  m  =  4,6, 8, 

Another  class  of  DbEC-TbED  codes  were  constructed  in  [2, 
Theorem  5],  which  have  the  parameters: 

n  =  9  .  r< - 2 — "  +  r  yl  +  r  "^"li  rn  =  3, 5, 7,  -  - . 

In  this  paper,  we  will  construct  a  new  class  of  DbEC-TbED 
codes  over  GF(q)  which  have  the  parameters: 


n  =  q 


r  < 


I  ^m  +  [f]  +  ffl 


,  when  m  =  3,5,7,- 
],  when  m  =  4, 6, 8,  - 


Let  H'  be  parity  check  matrices,  we  obtain  a  class  of  codes 
over  GF{ 3). 


Theorem  2  The  codes  in  Construction  II  have  the  parame¬ 
ters: 

n-  3m,  d>  6, 

.5(m  +  1),  when  m  —  3,5,7,  •••, 

.5m +  1,  when  m  —  4,6,  8, -•-. 


It  is  clear  that  {[5^m6~-^  J  |  m  =  4,6,8,  •••}  = 

{3, 4, 5, 7, 9, 10, 12, 14,  ■  ■  •},  and  it  can  be  verified  that  the  in¬ 
tegers  6,  8,  11,  13,  16,  18,  21,  23,  26,  28,  are  not  in 
this  set.  Thus,  we  extend  the  well-known  constructions  for 
m  =  6,  8, 16, 18,  ■  ■  -. 


Construction  III:  Consider  q  —  4.  Let  H"  be  the  sequences 
,of  all  of  the  polynomials  of  degree  <  3  in  H  and  H"  be  parity 
.check  matrices,  we  obtain  a  class  of  codes  over  GF( 4). 

Theorem  3  The  codes  in  Construction  III  have  the  parame¬ 
ters: 

n  =  4m,  d  >  6, 


r  < 


5(m  +  l) 


5m 


4- fy],  when  m  =  3,5,7,- 
-t-l  +  fy],  when  m  =  4,6, 8,  • 


Construction  I:  Let  m  >  4  and  1, 5, <52,  ■  -  - ,  <5m— 1  be  a  ba¬ 
sis  of  GF(qm),  when  m  is  even;  1, 5, 82 ,  ■  ■  -  ,Sm  be  a  basis  of 
GF(qm+1),  when  m  is  odd,  respectively.  Consider  the  se¬ 
quence  H  =  {/1,  f2,  •  ■  •}  of  polynomials  in  F9[xi,X2,  -  -  • ,  xm], 
where, 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  NCR-9804973. 
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Abstract  —  The  classes  of  convolutional  codes  over 
finite  Abelian  groups  which  admit  minimal  encoders 
or  systematic  encoders  are  first  characterized  and 
then  compared. 

I.  Introduction 

Codes  over  rings  and  groups  have  attracted  much  attention 
in  recent  years  for  their  potential  use  in  the  phase  modula¬ 
tion  coding  [1].  Here  we  study  convolutional  codes  over  finite 
Abelian  groups  presenting  necessary  and  sufficient  conditions 
under  which  they  admit  minimal  or  systematic  encoders. 

II.  Convolutional  codes  and  encoders 

Given  a  finite  Abelian  group  V,  Cv  is  the  group  of  Laurent 
sequences  over  F  (sequences  definitely  equal  to  0  in  the  past). 
If  F  and  W  are  finite  Abelian  groups,  any  element  N(D)  = 
NiD1  €  hom(VF,  F)  [[£>]]  induces  a  homomorphism  ( shift 
operator)  N(D)  :  Cw  — ►  Cv  by  letting  act  D  as  the  forward 
translation.  N(D )  is  called  rational  if  there  exists  p(D)  G 
Z[D]  such  that  p(D)N(D)  G  hom(W,  V)[D).  Rational  shift 
operators  are  exactly  those  which  admit  a  state  realization 
with  finite  state  space  [2] . 

A  convolutional  code  (c.c.  from  now  on)  over  F  is  any 
subgroup  C  C  Cv  for  which  there  exists  another  finite  Abelian 
group  W  and  a  rational  and  injective  shift  operator  N(D)  : 
Cw  —>  Cv  such  that  C  coincides  with  the  image  of  N(D). 
The  shift  operator  N(D)  is  said  to  be  an  encoder  for  C.  A 
c.c.  admits  infinitely  many  encoders,  but  they  all  have,  up  to 
isomorphism,  the  same  domain  W  which  will  be  denoted  by 
W(C)  and  called  the  encoding  group  of  C  [2]. 

Let  C  C  Cv  be  a  c.c  and  let  C_  (resp.  C+)  be  the  subgroup 
of  C  consisting  of  the  sequences  which  are  0  at  t  >  0  (resp. 
t  <  0).  Define  the  input  group  of  C  as  17(C)  :=  {x  €  V  : 
3 v  £  C+,  u(0)  =  a;},  and  the  state  group  of  C  as  the  quotient 
group  X(C)  :=  C/(C_  ®  C+).  Let  N(D )  :  Cw(c)  ->  Cv  be 
an  encoder  for  C.  It  can  be  shown  [2]  that  W(C)  and  U(C) 
have  the  same  cardinality.  Moreover,  N(D)  admits  a  state 
space  realization  with  minimal  state  space  X(N)  whose  size 
represents  the  amount  of  memory  needed  to  implement  N(D) 
on-line.  It  is  a  standard  result  that  X(N)  cannot  be  smaller 
than  X(C). 

With  no  loss  of  generality  we  will  assume  in  the  sequel  that 
for  any  x  G  V  there  exists  v  G  C  such  that  v(0)  =  x. 

III.  Minimal  and  systematic  group  behaviors 

We  now  introduce  two  important  classes  of  c.c.  A  c.c. 
C  C  Cv  is  said  to  be  minimal  if  it  admits  an  encoder  (called 
minimal)  N(D)  such  that  X(N)  is  isomorphic  to  X(C).  A  c.c. 
C  C  Cv  is  said  to  be  systematic  if  it  admits  an  encoder  (called 
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systematic)  N(D)  :  Cw  -»  Cv  of  the  following  type:  F  can 
be  split  as  F  =  W  ©  F  and  there  exists  N(D)  :  Cw  — ►  Cy 
such  that  N(D)w  =  ( w ,  N(D)w). 

In  the  field  case  it  is  well  known  that  any  c.c.  is  system¬ 
atic  and  minimal.  In  the  group  case  there  are  examples  of 
c.c.  which  are  not  minimal.  On  the  other  hand,  it  can  be 
shown  that  systematic  encoders  are  always  minimal  so  that 
a  systematic  c.c.  is  always  minimal.  The  following  theorem 
provides  a  characterization  of  systematic  c.c.  which  extends  a 
result  given  in  [1]. 

Theorem  1  Let  C  C  Cv  be  a  c.c..  The  following  conditions 
are  equivalent. 

1.  C  is  systematic. 

2.  There  exists  a  subgroup  V  of  V  such  that  V  —  U  (C)  ©  F . 

Condition  2.  can  be  checked  in  a  very  efficient  way  once 
we  have  the  c.c.  represented  as  the  image  of  an  encoder. 

The  relation  existing  between  minimal  and  systematic  c.c. 
is  clarified  by  the  following  result.  First  we  introduce  a  trans¬ 
formation  which  can  be  performed  on  a  code.  Fix  N  G  N  and 
consider  the  map  PN  :  Cv  — >  Cvn  defined  by  PN(y)(t)  := 
^|[tJV,£N+jv— x]*  If  C  G  Cv  is  a  c.c.,  -  P  (C)  Q  Cvn  is  a 

c.c,  too. 

Theorem  2  Let  C  C  Cv  be  a  c.c..  The  following  conditions 
are  equivalent. 

1.  C  is  minimal. 

2.  There  exists  N  G  N  such  that  CN  is  systematic. 

In  certain  situations  the  classes  of  minimal  and  systematic 
codes  do  coincide. 

Theorem  3  Let  C  C  Cv  be  a  code  and  assume  that  W(C ) 
is  a  Zn-free  module  for  some  integer  n.  Then,  the  following 
conditions  are  equivalent 

1.  C  is  minimal. 

2.  C  is  systematic. 

3.  U(C)  is  Zn- free. 

On  the  other  hand,  through  computer  search,  we  have 
found  a  minimal  c.c.  with  W(C)  =  Z4  ©  Z2  and  F  =  Z|, 
which  is  not  systematic. 
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Abstract  —  A  general  methodology  to  analyze  con¬ 
volutional  codes  over  block  fading  channels  is  pre¬ 
sented.  Starting  from  this  approach  some  good  gen¬ 
erator  polynomials  for  different  block  fading  channels 
are  obtained. 

I.  Methodology  and  Assumptions 

We  assume  a  block  fading  channel  [1,  2],  where  the  fad¬ 
ing  level  is  constant  over  B  encoded  bits.  The  number  of 
blocks  L  is  the  available  amount  of  diversity  provided  by  the 
channel.  The  achievable  diversity  per  dimension  depends  on 
the  code-rate  [1].  The  codeword  error  probability  (CEP)  for 
terminated-trellis  convolutional  codes  over  block  fading  chan¬ 
nel  is  obtained  from  a  suitably  defined  matrix  A  that  take  into 
account  the  trellis  structure  and  the  interleaving  function. 

Let  us  consider,  as  in  [3],  the  mxm  matrix  A (D)  (where  m  is 
the  number  of  trellis  states),  whose  elements  are  Aij  =  Dh  if 
a  transition  from  the  state  i  to  the  state  j  exists  and  produces 
an  output  with  Hamming-weight  h,  and  0  otherwise.  Assume 
for  the  sake  of  simplicity  a  rate  1/n  code,  if  the  fading  level 
were  constant  along  the  codeword  of  N  ■  n  encoded  bits,  we 
would  observe  that: 

Obs.  1  The  matrix  A(D)  =  A N  (D)  has  elements  A% 
that  take  into  account  all  the  transitions  from  state  i  to  state 
j  with  N  input  bits.  Obs.  2  For  zero  tailing  the  element  AfJ 
is  sufficient  to  obtain  the  code  weight  distribution. 

For  block  fading  channel  matrix  A  can  be  generalized  as 
a  combination  of  matrices  A(Di,  ..,D„)  with  elements  AtJ  = 
D'l1  ■ ..  ■  Dnn ,  with  hi  —  0, 1,  which  means  that  the  transition 
from  state  i  to  j  produces  the  output  (hi,  ...,hn).  Aij  is  equal 
to  0  if  no  transition  exists  from  i  to  j. 

For  uninterleaved  convolutional  codes  over  block  fading 
channel  we  have 

L 

A  =  J]A  N'L(Du..,Di)  (1) 

i=i 

where  N/L  is  the  number  of  transitions  per  block.  In  the  case 
of  branch-interleaving,  the  expression  for  the  matrix  A  is 


fading  level.  So, 

Au-1  =T(DUD2,..,DL)  =  ^..J]W(n..tz.).JDj1-..Dii 

M  <L 

(4) 

where  T(D\,  D?, Dl)  is  the  generalized  transfer  function. 

Upper-bounding  the  complementary  error  function  as 
erfc^Jx  +  y  <  erfcyfx  ■  e~y  <  e~C+y)  and  averaging  over 
fading  gives  the  bound  on  CEP: 


CEP  <  EE  w(ii..iL)  •  i  ^1 

'1  • L 


1 

1  +  i(7 


(5) 


£  =M(  f[  *)-  (6) 

(•1... ■*!.)€/„  1=1,  hA0 

where  the  sums  in  (5)  are  for  (*i,..,ti,)  ^  (0,  ..,0),  7  is  the 
average  signal  to  noise  ratio  and  Ia  is  the  set  of  (ti,..,*i) 
with  cv  non  zero  elements.  This  bound  can  be  also  derived 
from  \(A\i  —  1),  which  is  half  the  generalized  transfer  func¬ 
tion  T(D\, ...  Dl),  substituing  1  with  1;  a  term  £>(*  with 

1  —  y  1  lCh- ;  and  the  other  L  —  1  terms  Df  with  777^  ■ 

It  is  worth  noting  that  the  low  degree  terms  in  (4)  give  the 
diversity  order,  a,  achievable  by  a  given  coding  scheme  over 
the  block  fading  channel.  Moreover,  these  allow  an  asymp¬ 
totical  evaluation  of  the  average  CEP.  As  these  terms  can  be 
directly  derived  from  matrix  A,  a  comparison  among  different 
convolutional  codes  is  possible  in  order  to  design  good  codes. 
So  we  can  find  the  best  codes  given  L,  the  interleaving  strat¬ 
egy  and  the  codeword  length.  To  perform  an  efficient  search 
a  suitable  decomposition  of  A  has  been  developed.  As  an  ex¬ 
ample,  for  a  rate  1/2,  64  states  code  with  bit  interleaving  and 
N  =  194,  the  optimum  generator  polynomials  for  L  =  8  are 
(127, 155)8-  An  asymptotic  gain  of  0.3 dB  in  terms  of  signal  to 
noise  ratio  with  respect  to  the  optimum  generator  polynomials 
for  AWGN  has  been  verified  by  simulations.  These  generators 
are  optimum  for  any  N  larger  than  40.  Numerical  results  will 
be  presented  at  the  conference. 


A  =  [jjA(A,..,A)]N/i.  (2) 

i=  1 

For  the  bit-interleaved  case  the  expression  becomes 

L/n  B 

A  =  J  A(D(i_1)n+1 , ..,  Di„)j  (3) 

t=i 

The  element  in  gives  information  about  all  sequences  start¬ 
ing  from  and  ending  in  state  0;  Di  is  related  to  the  l  —  th 
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Abstract  —  In  this  paper,  we  concentrate  on  the 
study  of  combining  the  optimality  with  respect  to 
unequal  error  protection  and  canonicity  of  generator 
matrices  for  convolutional  codes.  The  transformation 
which  can  keep  the  optimality  of  generator  matrices  is 
constructed,  based  on  which  a  procedure  for  obtain¬ 
ing  a  basic  and  optimal  generator  matrix  with  the 
smallest  external  degree  is  also  proposed.  Moreover, 
necessary  and  sufficient  conditions  for  a  canonical  gen¬ 
erator  matrix  whose  separation  vector  is  the  greatest 
among  all  canonical  generator  matrices  are  given.  Fi¬ 
nally,  the  existence  of  the  greatest  separation  vector 
among  all  canonical  generator  matrices  is  proved  for 
some  convolutional  codes. 

In  a  previous  paper  [1],  we  showed  that  every  convolutional 
code  has  at  least  one  optimal  generator  matrix  with  respect 
to  unequal  error  protection.  A  procedure  for  converting  an 
arbitrary  optimal  generator  matrix  to  a  basic  [2]  polynomial 
generator  matrix  (PGM)  without  affecting  its  optimality  was 
also  proposed.  However,  by  a  counter-example,  we  showed 
that  not  every  convolutional  code  can  have  an  optimal  gen¬ 
erator  matrix  which  is  also  canonical  [2].  Since  the  external 
degree  [2]  of  a  PGM  corresponds  to  the  number  of  memory 
elements  in  direct-form  realization  of  this  PGM,  to  reduce  the 
hardware  complexity,  it  is  desirable  to  generate  a  basic  and 
optimal  generator  matrix  of  the  smallest  external  degree. 

To  obtain  the  transformation  between  optimal  generator 
matrices,  we  first  define  an  effectively  lower-triangular  matrix. 

Definition  1  Let  G(D)  be  a  generator  matrix  of  an  (n,k) 
convolutional  code.  Assume  the  components  of  the  separation 
vector  [1]  b(G(D))  are  nondecreasingly  ordered  and  have  a 
distinct  values,  each  with  pi  repetitions  for  all  1  <  i  <  a.  For 
a  k  x  k  matrix  T(D)  over  F(D),  where  F(D)  is  the  rational 
field  over  a  field  F,  let  tu,v(D)  be  the  entry  in  position  (u,v) 
ofT(D)  for  all  1  <  u,v  <  k.  T(D)  is  called  effectively  lower- 
triangular  with  respect  to  G(D)  if  and  only  if 

tv,v{D)  =  0 

for  all  JXJ  Pi  <u<  £J=1  Pi,v>  £J=1  Pi,  and\<i<a. 

Based  on  effectively  lower-triangular  matrices,  necessary  and 
sufficient  conditions  for  the  transformation  between  all  opti¬ 
mal  and  basic  generator  matrices  axe  given  as  follows. 

Theorem  1  Given  an  (n,k)  convolutional  code  C,  let  G{D) 
be  an  optimal  and  basic  generator  matrix  of  nondecreasing  sep¬ 
aration  vector.  For  any  k  x  k  nonsingular  matrix  T(D)  over 
F(D),  T(D)  ■  G(D)  is  optimal  and  basic  if  and  only  ifT(D) 

'This  work  was  supported  by  the  National  Science  Council  of 
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is  unimodular  and  effectively  lower-triangular  with  respect  to 
G{D). 

Based  on  Theorem  1,  a  procedure  for  obtaining  a  basic  and 
optimal  generator  matrix  which  has  the  smallest  external  de¬ 
gree  is  proposed. 

In  addition,  some  properties  of  canonical  PGM’s  for  UEP 
are  discussed  below.  If  there  exists  a  canonical  PGM  of  the 
greatest  separation  vector,  the  corresponding  necessary  and 
sufficient  conditions  axe  given  in  Theorem  2. 

Theorem  2  Consider  an  (n,k)  convolutional  code  C.  Define 
w(C)  =  {w(c(D))  :  V  c(D)  G  C }  and  Cp  =  {c(D)  :  V  c(D)  G 
C  and  w(c(D))  <  p}.  Without  loss  of  generality,  assume  the 
components  of  the  separation  vectors  corresponding  to  the  fol¬ 
lowing  generator  matrices  are  nondecreasingly  ordered.  A  gen¬ 
erator  matrix  G(D)  has  the  greatest  separation  vector  among 
all  canonical  generator  matrices  if  and  only  i/V  p  G  w(C),  for 
any  canonical  generator  matrix  A(D)  of  C  satisfying 

{Cp)C{al(D),a2(D),---,ai(D)) 

we  have 

<C0C<9l(D),g2(D),...,P<(D)> 

where  G(D)  and  A{D)  have  rows  g^D)  ’s  and  a,(D)  ’s  for  all 
1  <  i  <  k,  respectively. 

Although  we  have  shown  that  every  convolutional  code  has  an 
optimal  matrix,  however,  the  existence  of  a  canonical  PGM 
whose  separation  vector  is  the  greatest  among  all  canonical 
PGM’s  is  still  doubtful.  Instead  of  a  general  proof,  in  Theorem 
3,  we  show  the  existence  of  a  canonical  PGM  with  the  greatest 
separation  vector  for  the  convolutional  codes  of  k  <  3. 

Theorem  3  Let  G{D)  and  G'(D)  be  canonical  generator  ma¬ 
trices  of  an  (n,k)  convolutional  code  C  with  k  <  3.  Ifa{G{D)) 
and  a(G'(D))  are  not  comparable,  there  exists  another  canon¬ 
ical  generator  matrix  G*(D)  and  two  permutations  </>  and  <j>' 
of  vector  components  such  that 

a(G*(D))  >  <I>(b{G(D)))  and  a(G*(D))  >  <f>' {a(G’ (D))) . 

Finally,  following  a  similar  proof,  the  result  of  Theorem  3  can 
be  directly  extended  to  the  convolutional  codes  whose  optimal 
generator  matrix  has  distinct  components  in  the  separation 
vector. 
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Abstract  —  Performance  bounds  for  maximum  like¬ 
lihood  decoding  of  convolutional  codes  over  memory¬ 
less  channels  are  commonly  measured  using  the  first 
few  terms  of  the  series  expansion  of  the  transfer  func¬ 
tion  T(x,y).  In  this  paper  we  present  an  efficient  alge¬ 
braic  method  to  obtain  this  truncated  series  without 
first  computing  the  complete  T(x,y). 

I.  The  Path  Weight  Enumerators 

Let  5  be  the  set  of  all  paths  of  an  1/n-rate  convolutional 
code  with  constraint  length  K,  that  diverge  from  the  all-zero 
path  at  t  =  0  and  remerge  into  the  all-zero  path  at  some  time 
later.  Let  uii  and  W2  be  weight  functions  such  that  wi(cr )  and 
W2  (cr )  are  the  number  of  l’s  in  the  input  and  output  sequence, 
respectively,  corresponding  to  a  state  sequence  cr  £  5.  T(x,y) 
is  the  generating  series  for  the  set  5  with  respect  to  wi  and 
w 2,  that  is,  T(x,y)  =  £  s  xWl<'a  ^  ywAcr  )  Tim  number 

of  paths  in  S  of  Hamming  weight  d  is  the  coefficient  of  yd  in 
T(l,y),  and  the  total  number  of  nonzero  information  bits  in 
all  paths  of  Hamming  weight  d  in  5  is  the  coefficient  of  yd  in 
f 

l  dx  J  x—1  ’ 

The  first  step  to  compute  T(x,y)  is  to  generate  the  adja¬ 
cent  matrix  A  as  follows.  The  ( i,j)th  entry  of  A  is  either 
[A]i,j  =  xu'1(*-*J)  where  w\{i  — t  j )  and  W2 (i  — +  j ) 

are  the  Hamming  weights  of  the  input  and  output  strings  on 
the  branch  that  connects  the  states  i  and  j,  respectively,  or 
zero,  if  i  and  j  are  not  connected.  All  state  sequences  in  5 
have  the  following  structure:  The  first  symbol  is  0,  the  sec¬ 
ond  is  1,  the  third  is  either  2  or  3,  and  so  on,  the  second  last 
symbol  is  2K~2 ,  and  the  last  symbol  is  0.  Define  a  non-zero 
path  as  a  path  which  does  not  enter  or  leave  the  zero  state. 
Let  Ti  (a:,  y)  be  the  generating  series  that  enumerates  non-zero 
paths  from  the  initial  state  1  to  the  terminal  state  2K~ 2  with 
respect  to  wi  and  w 2-  Thus 

T(x,y)  =  [A]o,i  T\  (x,y)  [A]2k-2i0.  (1) 

Let  A(0)  be  a  matrix  identical  to  its  counterpart  A,  except 
that  the  first  row  and  the  first  column  axe  set  to  zero.  Then 

Ti{x,y)=  [(I  — A(0))-']li2*_2.  (2) 

The  (1,  2K~2)t,1-entry  of  the  kth  power  of  A(0)  is  a  bivari¬ 
ate  polynomial  whose  exponents  are  Hamming  weights  rui(rr) 
and  u>2  (cr )  of  all  non-zero  paths  originating  in  state  1  and 
terminating  in  state  2K~2,  and  the  coefficients  are  the  multi¬ 
plicity  of  the  weights.  It  is  necessary  to  invert  a  2K_1  x  2K~ 1 
symbolic  matrix  in  order  to  find  a  closed  form  expression  for 
Ti(x,y).  We  propose  next  an  iterative  procedure  for  calculat¬ 
ing  Ti(x,y),  called  state  reduction  algorithm,  that  discards, 
at  each  step,  all  paths  with  Hamming  weight  higher  than  a 
given  order.  We  need  the  following  definitions: 

xThis  work  was  supported  by  CNPq  under  Grant  300987/96-0. 


Definition  1:  Two  finite  state  machines  (FSM)  are  said  to  be 
equivalent  if  and  only  if  their  transfer  functions  are  identical. 
Definition  2:  Two  FSM  are  said  to  be  equivalent  of  or¬ 
der  Lm  if  and  only  if  the  series  expansion  of  T(x,y)  and 
{dT(l,y)/dx}x=l  of  order  Lm  and  lower  axe  the  same  for 
the  two  FSM. 

II.  State  Reduction  Algorithm 

The  algorithm  creates  a  sequence  of  adjacent  matrices  repre¬ 
senting  equivalent  FSM  of  order  Lm  with  one  state  less.  It 
should  be  observed  that  each  non-zero  path  is  formed  by  con¬ 
catenating  paths  that  start  from  state  1  and  reach  state  2K~ 2 
for  the  first  time  some  time  later.  Call  the  set  of  all  such  paths 
52.  For  example,  the  path  <r  =  124|124|1364  is  the  concate¬ 
nation  of  3  paths  belonging  to  52-  If  T2(x,y)  is  the  generating 
series  for  the  set  S2,  we  have: 

Ti  (x,  y)  —  T2(x,  y)(l  -  [A]2*_21  T2{x,y))~\ 

To  calculate  T2(x,y)  we  may  form  a  sequence  of  equivalent 
FSM  where  at  each  step  we  eliminate  transitions  from  and 
into  the  rth  state.  The  2K_1  x  2K~’  adjacent  matrix  for 
this  equivalent  FSM,  denoted  by  A(r),  is  calculated  from  the 
adjacent  matrix  of  the  previous  step  A(s)  (obtained  from  the 
elimination  of  the  sth  state)  as  shown  in  the  following  lemma. 
Lemma  1  Let  1Z  and  C  be  sets  of  indexes  l,  l  =  1,  •  •  • ,  2K_1, 
l  ^  r,  such  that  [A(s)]i,r  and  [A(s)]r,i  are  different  from  zero, 
respectively.  The  ( i,j)th  entries  of  the  matrix  A (r)  are: 

[A(s)]i,j  +  [A(s)]j,r(l  —  [A(s)]r,r)  1[A(s)]rj,  if  i  £  TZ,  j  6  C; 

0,  if  i  =  r,j  =  1,  ■■■,2k~1- 

0,  if  j  =  r,  i  =  1,  •  •  •  ,  2K~l ; 

[A(s)]t,j ,  otherwise, 

where  on  the  first  row,  [A(s)]j|J  is  due  to  parallel  transitions, 
and  (1  —  [A(s)]r,r)_1  stands  for  the  circulation  loop  on  the  rth 
state.  The  state  reduction  algorithm  is  summarized  below: 

•  Set  s  —  0.  Find  A(0). 

•  Form  the  sequence  of  equivalent  FSM  A (r),  r  =  2K_1  - 
1,  •  •  ■ ,  2K~ 2  -  1,  2k~ 2  +  1,  •  •  • ,  2,  according  to  Lemma  1. 

•  T2(x,y)  =  [A(2)]j  2k--2 . 

We  propose  next  a  modification  of  the  algorithm  which  is  sig¬ 
nificant  in  practice.  We  will  create  a  sequence  of  equivalent 
FSM  of  order  Lm  by  performing  the  following  operation:  Af¬ 
ter  calculating  [A(r)];j,  i  6  7Z,  j  £  C,  according  Lemma  1,  we 
compute  symbolically  its  series  expansion  with  respect  to  the 
variable  y ,  up  to  order  Lm.  The  algorithm  has  two  new  fea¬ 
tures.  First,  we  defined  combinatorial  identities  to  work  with 
equivalent  FSM  at  the  level  of  the  adjacent  matrix  which  is 
convenient  for  symbolic  computation.  Second,  no  matter  the 
number  of  states,  the  entries  of  A(r)  are  bivariate  polynomi¬ 
als  whose  powers  of  y  are  of  order  at  most  Lrn ,  resulting  in 
a  truncated  transfer  function  with  considerable  less  storage 
requirements. 
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Two  terminals,  T1  and  T2,  wish  to  communicate  over  the 
binary  multiplying  channel  (BMC).  To  this  end,  they  choose 
sets  X  and  Y,  respectively,  of  (input)  vectors  in  {0,l}n.  If 
x£X  and  y€Y  are  fed  to  the  BMC,  it  gives  as  output  the 
vector  xy,  defined  by  (x  •  y);  ==  x,yi  for  all  ie{l,2, . . .  ,n). 
Each  terminal  should  be  able  to  determine  unambiguously  the 
vector  transmitted  by  the  other  one,  using  its  own  transmit¬ 
ted  vector  and  the  observed  channel  output.  We  call  a  pair 
(X,Y)  satisfying  this  requirement  uniquely  decodable ,  or  UD 
for  short.  Moreover,  we  call  a  UD  pair  (X,Y)  symmetric  if 
X  =  Y.  Note  that  unlike  [1],  we  do  not  allow  feedback,  that 
is,  encoding  of  a  message  does  not  depend  on  the  output  bits 
observed  so  fax. 

If  (X,  Y)  is  a  UD  pair  of  length  n,  we  define  the  rate  pair 
(R(X),R(Y))  -  (~  log  l-XJ,  i  log  |Y|).  As  usual,  all  loga¬ 
rithms  have  base  2.  A  rate  pair  (x,  y)  will  be  called  achievable 
if  for  each  e  >  0,  there  exists  a  UD  pair  (X,  Y )  such  that 
R(X)  >  x  —  e  and  R(Y)  >  y  —  e.  The  set  of  achievable  rate 
pairs  will  be  called  the  zero-error  capacity  region  of  the  BMC 
without  feedback,  and  it  will  be  denoted  by  Z. 

In  [2],  we  construct  UD  codepairs  from  cosets  of  binary 
linear  codes  with  many  information  sets  and  obtain  the 
following  theorem,  in  which  h  denotes  the  binary  entropy 
function. 

Theorem 

{(h{R2)  +  Ri  -  l,h(Ri)  +  R2  -  1)  |  |  <  Ri,R2  <  1}  C  Z. 
For  |  <  R  <  1,  the  rate  pair  ( h(R )  +  R  —  1  ,h(R)  +  R  —  1) 
can  be  achieved  with  symmetric  UD  pairs. 

Specializing  the  theorem  to  the  case  R=  2/3,  we  find 

Corollary  The  rate  pair  (log(3/2), log(3/2))  ss  (0.585,0.585) 
can  be  achieved  with  symmetric  UD  pairs. 

The  rate  pair  of  the  corollary  yields  the  largest  known  sum 
of  the  rates  of  pairs  in  Z,  and  clearly  improves  on  the  largest 
known  sum  rate  so  far  attained  by  a  UD  pair  with  rate 
pair  (0.548,0.548)  [3].  It  follows  from  [4,  Thm.  3]  that  the 
rate  pair  (log(3/2),log(3/2))  is  the  largest  possible  that  can 
be  achieved  with  symmetric  UD  pairs.  Stated  differently, 
asymptotically  our  construction  yields  cancellative  families  of 
sets  [4]  [5,  Sec.  VII]  of  largest  possible  rate. 

The  results  are  represented  graphically  in  Figure  1.  The 
rate  pairs  from  the  theorem  lie  on  and  below  the  curve  N, 
labelled  by  “new  rate  pairs”.  As  ({1},{0,1})  is  a  UD  pair, 
(0,1)€Z;  similarly,  (1,0 )€Z.  With  a  time  sharing  argument 
[1,  Sec.  8],  it  can  be  shown  that  Z  is  convex.  Consequently, 
all  rate  pairs  on  and  below  the  tangents  to  N  through  (1,0) 
and  (0,1)  are  in  Z.  The  relevant  segments  of  these  tangents 


axe  drawn  as  well. 

The  line  segment  “upper  bound”  represents  the  upper  bound 
of  [6],  according  to  which  x+y  <  1.2181  for  any  (x,  y)  G  Z.  As 
remarked  by  Erik  Meeuwissen  in  [7,  Stellling  2],  combination 
of  this  upper  bound  with  Shannon’s  lower  bound  [1,  Sec.  13] 
shows  that  the  zero-error  capacity  region  of  the  BMC  without 
feedback  is  strictly  smaller  than  its  e-error  capacity  region. 


Fig.  1:  Graphical  representation  of  the  results.  All  points 
below  the  solid  curve  or  its  two  tangents  are  in  Z. 
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Abstract  —  The  error  exponent  of  the  two-user 
Poisson  multiple-access  channel  under  peak  and  av¬ 
erage  power  constraints,  but  unlimited  in  bandwidth, 
is  considered.  First,  a  random  coding  lower  bound 
on  the  error  exponent  is  obtained,  and  an  extension 
of  Wyner’s  single-user  codes  [1]  is  shown  to  be  expo¬ 
nentially  optimum  for  this  case  as  well.  Second,  the 
sphere  packing  bounding  technique  suggested  in  [2]  is 
generalized  to  the  case  at  hand  and  an  upper  bound 
on  the  error  exponent,  which  coincides  with  the  lower 
bound,  is  derived. 

The  model  studied  here  assumes  two  independent  users 
that  generate  the  inputs  Am;  (t)  ,  2  =  1,2,  0  <  t  <  oo,  which 
determine  the  rates  of  two  corresponding  doubly  stochastic 
Poisson  processes  di(t).  The  observation  is 

2 

v(t)  =  ^di(t)  +  D(t)  , 

i=  1 

which  is  also  a  Poisson  process  with  instantaneous  rate  Ao  + 
Am,  (t).  The  dark  current  represented  by  D(t)  is  a  ho¬ 
mogeneous  Poisson  process  of  rate  Ao-  It  is  further  assumed 
that  the  waveforms  are  subject  to  peak  and  average  power 
constraints  -  i.e.  0  <  Ami(t)  <  A  ,  1/T  \mi(t)dt  <  qtA.. 

Using  a  DMC  decomposition  for  our  continouos-time  model 
the  two-user  capacity  region  of  [3]  is  obtained.  Furthermore, 
applying  the  rate-splitting  technique  of  [4]  to  our  discrete  time 
model  we  conclude  that  in  the  non  band  limited  case  rate¬ 
splitting  extends  to  the  continuous-time  Poisson  channel. 

Next  assuming  maximum-likelihood  decoding,  a  lower 
bound  on  the  error  exponent  is  computed  via  the  random 
coding  error  exponent  of  this  DMC  decomposition.  The  ex¬ 
ponent  consists  of  two  terms;  the  successive  decoding  and  joint 
decoding  exponents  defined  respectively  by  (s  =  Ao /A) 


An  extension  of  the  code  construction  of  [1,  part  I]  to  the 
case  at  hand  is  presented  wherein  a  two-user  code  with  non¬ 
equal  (71  <  <72)  average-power  cnstraint  is  constructed.  This 
is  acomplished  by  constructing  first  a  (92,  M\  +  M2,  T)  Wyner 
code  and  then  modifying  a  ( q2,Mi,T )  subcode  to  conform 
with  the  qi  constraint.  The  resulting  code  exhibits  the  statis¬ 
tical  properties  of  a  two-user  “random-code”  hence  the  corre¬ 
sponding  upper  bounds  on  the  successive  decoding  and  joint 
decoding  error  probabilities  are  shown  to  yield  the  exponents 

(1)  and  (2). 

We  extend  the  approach  outlined  in  [2]  to  the  two-user  case 
thereby  obtaining  a  sphere-packing  lower  bound  on  the  error 
probability.  Specifically,  we  associate  a  “volume”  with  the 
set  of  all  sequences  representing  a  realization  of  n  arrivals  on 
[0,T].  Given  a  specific  realization  of  n  arrivals,  each  hypothe¬ 
sis  of  transmitted  two-user  message  determines  a  configuration 
triple  (721,722,220)  consisting  of  the  number  of  photon  arrivals 
on  the  time  slot  where  only  one  of  the  users  is  active,  both  of 
them  are  active  and  none  of  them  is  active,  respectively.  Now, 
each  such  configuration  is  also  associated  with  a  correspond¬ 
ing  volume.  Using  these  definitions  we  derive  a  lower  bound 
on  the  error  probability. 

We  prove  that  in  the  non  band  limited  regime  binary  sig¬ 
naling  incurrs  but  a  negligible  loss  in  the  error  probability. 
Furthermore,  it  is  shown  that  equi-energy  signaling  for  each 
of  the  users  is  optimal  from  the  error  probability  aspect.  These 
conclusions  lead  to  a  sphere-packing  exponent  which  coincides 
with  the  random  coding  lower  bound. 

Using  similar  arguments  as  in  [1,  part  II]  we  show  that  the 
straight  line  bound  is  tight  for  rates  below  the  cutoff  rate. 

Consequently,  the  two-user  Poisson  MAC  joins  its  single- 
user  partner  as  one  of  very  few  for  which  the  reliability  func¬ 
tion  is  known. 
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-q2(l  +  s)[l+nqi)1+p  (1) 

with 

To  =  (1  +  1/s)  2+7  —  1  ,  n  =  [1  +  l/(s  +  1)]*+?  —  1 
and 

Eu(p,qi,q2)  — 


s  +  9i  +  92  - 


(1  -  <71  )(1  -  q2)s1+p 


n  1 +p 


+  (9l  +  92  -  29i92)(l  +  s)  -I-  9192(2  +  s)  ‘+e 


(-2) 
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Abstract  —  We  consider  uncoordinated  multiple- 
access.  Here,  a  number  of  transmitter-receiver  pairs 
operate  independently  over  a  common  channel  and  re¬ 
gard  the  transmissions  of  the  remaining  users  as  ran¬ 
dom  noise.  It  is  shown,  that  for  the  uncoordinated 
binary  adder  channel,  the  capacity  is  upper  bounded 
by  l/ln2  bits/transmission  and  does  not  grow  loga¬ 
rithmically  with  the  number  of  users  as  it  does  in  the 
coordinated  case.  An  asymptotic  lower  bound  for  the 
capacity  is  given.  Further  examples  of  uncoordinated 
channels  are  studied. 


I.  Introduction 

Here,  we  are  interested  in  uncoordinated  multiple-access.  Each 
transmitter  has  a  dedicated  receiver,  that  only  decodes  the 
messages  intended  for  him  and  regards  the  remaining  trans¬ 
missions  as  random  noise  ( single-user  detection). 

The  following  approach  to  uncoordinated  multiple-access 
has  been  introduced  by  Cohen,  et  al.  [2]:  The  individual 
transmissions  are  treated  as  identical  single-user  channels  with 
identical  outputs.  The  activity  of  the  other  users  stimulates 
channel  transitions.  As  a  result,  transition  probabilities  are 
functions  of  the  input  distribution. 

The  (total)  capacity  of  an  uncoordinated  multiple-access 
channel  is  defined  by 


Cuncoord.  —  T  • 


maxC H(Y)-H(Y\Xi)) 

p(x) 


(1) 


The  maximum  is  taken  over  the  input  distribution  (which  is 
common  to  all  users). 


II.  Binary  adder  channel 

The  binary  adder  multiple-access  channel  accepts  binary 
input  Xi  €  {0,1}  from  each  of  T  transmitters.  The  channel 
output  y  €  {0, ...  ,T}  is  the  algebraic  sum  of  the  inputs, 
y  —  x\  +  X2  H - 1-  XT- 

For  the  coordinated  binary  adder  multiple-access  channel, 
Chang  and  Wolf  [1]  found  that  the  capacity  is  achieved  by 
P(Xi  —  0)  =  P{Xi  =  1)  =  i.  It  increases  with  the  logarithm 
of  T. 

Figure  1  shows  one  of  the  equivalent  single-user  channels  for 
the  binary  adder  channel.  The  input  probabilities  are  P(X  — 
1)  =  p  and  P(X  =  0)  =  1  —  p. 

We  can  show,  that  the  mutual  information  of  the  single- 
user  channels  can  be  written  as 


/(*;F)  =  £(T7V(l-pf 


/ 1  ,,  T  —  i  *  + 1 

(1-f,)lo&f(rT)+'’log!TFj 


■  (2) 


Fig.  1:  Binary  adder  channel  as  seen  by  an  individual  transmitter- 
receiver  pair 


It  can  then  be  shown: 

Theorem  1  The  capacity  of  the  uncoordinated  T  user 
multiple-access  binary  adder  channel  is  upper  bounded  by  C  < 
D77  bits/transmission. 

The  capacity  does  not  grow  with  the  number  of  users  as  it" 
does  in  the  coordinated  case. 

Theorem  2  As  T  -+  00,  for  the  capacity  of  the  uncoordinated 
T-user  multiple- access  binary  adder  channel,  it  holds  Ct^oo  > 
.8371  bits/transmission. 


III.  Further  channels 

In  addition  the  uncoordinated  XOR  channel  and  an  unco¬ 
ordinated  continous-time  channel  are  studied. 
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Abstract  —  The  multiple-access  relay  channel 
(MARC)  is  introduced  and  capacity  outer  and  inner 
bounds  for  it  are  derived. 

I.  Introduction 

The  spectral  efficiency  of  mobile  radio  networks  can  be  im¬ 
proved  by  allowing  each  mobile  station  to  act  as  a  relay  for 
one  other  mobile  station.  One  can  expect  further  performance 
improvement  if  each  relay  aids  not  just  a  single  mobile  sta¬ 
tion,  but  many  simultaneously.  We  attempt  to  quantify  this 
improvement  by  introducing  the  multiple-access  relay  chan¬ 
nel  (MARC)  and  deriving  capacity  results  for  it.  Most  of  the 
discussion  is  restricted  to  the  white  Gaussian  MARC. 

II.  Model 

A  white  Gaussian  MARC  is  a  K  +  2  terminal  channel  with 
K  +  1  inputs  Xi,  X2,  •  • Xk,  Xr  and  two  outputs  Yd  and 
Yr  such  that 


where  Zd  and  Zr  are  zero-mean  Gaussian  random  variables 
with  variances  No  and  Nr,  respectively.  The  terminal  trans¬ 
mitting  Xk  sends  a  Bk  bit  message  to  the  destination  termi¬ 
nal  receiving  Yd,  k  =  1, . . . ,  K.  A  relay  terminal  observes 
Yr  and  transmits  Xr.  There  are  block  energy  constraints  on 
the  N  transmissions:  E[|A4n|2]/./V  <  Pk,  k  =  1, . . . ,  K, 

and  E[|ARn|2]/JV  <  Pr.  The  capacity  region  Rm arc 

is  the  closure  of  the  set  of  rate-tuples  (Rj, . . . ,  Rk),  where 
Rk  =  Bk/N  bits  per  use,  at  which  the  destination  terminal 
can  decode  the  K  messages  with  arbitrarily  small  positive  er¬ 
ror  probability. 


III.  An  Outer  Bound 

One  can  derive  the  following  outer  bound  to  Rmarc  by  fol¬ 
lowing  similar  steps  as  in  the  proof  of  Theorem  4  in  [1],  This 
outer  bound  applies  to  both  discrete  memoryless  and  white 
Gaussian  MARCs.  We  write  X($)  =  (AT  :  k  €  27}  for  a  set  S. 


Theorem  1  Rmarc  is  contained  within  the  convex  hull,  of 
the  set  of  rate-tuples  (Ri, . . . ,  Rk)  satisfying 


°<  [HX^YrYd^s^Xr), 

/(A(s)Ar;Yd|X(sC))],  W 

where  S  is  any  subset  of  { 1,2,...,  K),  Sc  is  the  complement 
of  S  in  {1,  2, . . . ,  K),  and  P{x  1,0:2, . . . ,  xk,  xr)  factors  as 


n^) 


■  P(xr\x1i.  . .  ,xk). 


(3) 


1This  work  was  performed  while  this  author  was  with  Endora 
Tech  AG,  Hirschgasslein  40,  4051  Basel,  Switzerland. 


IV.  Information  Rates 

We  extend  the  coding  technique  of  [1,  Sec.  IV],  Consider 
the  independent,  zero  mean,  emit  variance,  Gaussian  random 
variables  14  and  Wk,  k  =  1, . . . ,  K,  and  set 

Xk  =  \[Pk  ■  {y/okVk  +  \Jl  -  otk  Wk), 

Xr  =  VP^-ELiVFkVk,  [) 

where  0  <  ak  <  1,  (3k  >  0  and  ^2k=1  Pk  —  1.  Terminal  k  ran¬ 
domly  generates  a  certain  number  2NRk°  of  codewords  vk(i) 
of  length  N  by  using  Pyk  in  the  usual  memoryless  fashion. 
For  each  vk(i),  terminal  k  generates  2NRk  codewords  wk  by 
using  Pwk  and  forms 

xk(i)  =  \/~Pk-  {\/akVk(i)  +  \/l  -  cnkwk). 

Each  xk(i)  is  then  associated  with  a  vk(j),  where  j  may  not  be 
i,  by  using  the  random  partitioning  technique  of  [1,  p.  575], 
The  transmission  is  in  blocks  of  length  N.  Terminal  k 
chooses  that  vk(i)  associated  with  the  xk  of  the  previous  block 
and  lets  the  current  block’s  message  choose  one  of  the  2NRk 
xk(i).  The  relay  terminal  is  assumed  to  have  decoded  all  xk  of 
the  previous  block  and  hence  knows  the  vk[i).  He  transmits 
Xr  —  V Pr  •  y/PkHki*)-  The  resulting  information  rates 

suggest  that  the  rate-tuple  (Ri, .  . .  ,  Rk)  is  approachable  if, 
for  all  5  C  (1, ... ,  K}, 

0  -  YlkesRk  -  min  [-f(A'is);  V/j|A(iSc)V({1 . K})), 

/(A(S)V(5);Yn  lA^C)^))]  . 

The  region  of  (5)  can  enlarge  the  basic  K-user  multiple-access 
capacity  region,  and  for  K  =  1  it  is  the  same  region  as  that 
in  [1],  However,  (5)  is  generally  smaller  than  the  anticipated 

0  <  Zfc6S  Rk  <  min  [I(X(s)-Yr\X{sC)Xr), 

I(X(S)Xr-,Yd\X{SC))},  (1 

whose  only  difference  to  (2)  is  that  Yd  is  missing  in  the  first 
information  inside  the  square  brackets.  Note  that  the  same 
probability  distribution  (3)  is  used  for  (2),  (5)  and  (6). 

As  a  simple  example,  consider  the  case  where  the  relay 
aids  terminal  k  =  1  only,  i.e. ,  Pi  —  1.  We  can  then  set  a.k  =  0 
for  k  =  2 ,...,K  and  can  achieve  the  region  (6).  However, 
the  probability  distribution  P(x\,X2,  ■  ■ .  ,xk,xr)  factors  as 
[nf=1pM-RMxi)  rather  than  as  in  (3). 

It  is  unclear  whether  the  region  of  (6)  is  achievable  with  (3). 
In  any  case,  we  show  that  the  region  of  (6)  differs  from  that 
of  (2)  for  any  sum-of-rates  by  at  most  a  factor  of  1  +  Nr/Nd 
in  terms  of  signal-to-noise  ratio.  This  factor  is  at  most  2  for 
the  usual  case  where  Nr  <  Nd ■  For  K  =  1  this  gives  a  simple 
outer  bound  to  any  eventual  rate  increase  over  (6). 
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Abstract —  We  investigate  the  task  of  compressing  an  image 
by  using  different  probability  models  for  compressing  different 
regions  of  the  image.  We  introduce  a  class  of  probability  models 
for  images,  the  k-rectangular  tilings  of  an  image,  that  is  formed 
by  partitioning  the  image  into  k  rectangular  regions  and  gener¬ 
ating  the  coefficients  within  each  region  by  using  a  probability 
model  selected  from  a  finite  class  of  N  probability  models.  For 
an  image  of  size  n  x  n,  we  give  a  sequential  probability  assign¬ 
ment  algorithm  that  codes  the  image  with  a  code  length  which  is 
within  0(k  log  ^ )  of  the  code  length  produced  by  the  best  prob¬ 
ability  model  in  the  class.  The  algorithm  has  a  computational 
complexity  of  0(  Nn 3 ).  An  interesting  subclass  of  the  class  of  k- 
rectangular  tilings  is  the  class  of  tilings  using  rectangles  whose 
widths  are  powers  of  two.  This  class  is  far  more  flexible  than 
quadtrees  and  yet  has  a  sequential  probability  assignment  algo¬ 
rithm  that  produces  a  code  length  that  is  within  Offclog  ^4) 
of  the  best  model  in  the  class  with  a  computational  complexity 
of  0(Nn 2  log  n)  (similar  to  the  computational  complexity  of  se¬ 
quential  probability  assignment  using  quadtrees). 

I.  Introduction 

Consider  the  task  of  compressing  a  wavelet  subband  comprising 
n  x  n  wavelet  coefficients  that  have  been  quantized  using  a  scalar 
quantizer.  For  natural  images,  it  is  well  known  that  the  wavelet  co¬ 
efficients  are  small  in  smooth  areas  and  large  in  the  neighbourhood 
of  edges.  Because  of  that,  we  would  like  to  use  different  probability 
models  for  coding  different  parts  of  the  subband  in  order  to  obtain 
good  compression.  We  will  restrict  ourselves  to  a  finite  number  N  of 
different  probability  models  to  choose  from. 

We  introduce  a  class  of  probability  models  formed  by  partitioning 
the  image  into  k  rectangular  regions  and  generating  the  coefficients 
within  each  region  by  using  a  probability  model  from  the  finite  class 
of  N  probability  models.  We  call  the  class  of  probability  models 
that  is  generated  in  this  way  the  class  of  k-rectangular  tilings  of  the 
image.  Our  algorithm  aims  to  compress  as  well  as  the  best  model  in 
this  class. 

II.  Related  Work 

The  class  of  A; -rectangular  tilings  can  be  considered  as  a  natural 
extension  to  two  dimensions  of  the  class  of  piecewise-identically- 
distributed  source  for  sequences  studied  in  information  theory  [6, 4]. 
Similar  methods  have  also  been  studied  in  computational  learning 
theory  [2,  5,  1].  In  fact,  the  method  described  in  this  abstract  is  an 
extension  of  the  specialist  method  in  [1]  to  two  dimensions. 

III.  Main  Results 

In  this  paper,  we  provide  a  sequential  probability  assignment 
algorithm  that  codes  the  image  with  a  code  length  that  is  within 

1  This  work  was  supported  in  part  by  the  National  University  of  Singapore 
Academic  Research  Fund  grant  RP3992710. 


0{  k  log  —■)  bits  of  the  code  length  produced  by  the  best  model  in 
the  class  of  fc -rectangular  tilings  of  the  image,  where  k  does  not  need 
to  be  known  in  advance.  The  computational  complexity  of  the  al¬ 
gorithm  is  0(Nn3).  If  we  restrict  the  class  of  probability  models 
to  those  generated  using  rectangular  partitions  of  D  discrete  widths, 
the  computational  complexity  can  be  improved  to  0(Nn2  D).  This 
means  that  we  can  have  a  fast  algorithm  of  computational  complex¬ 
ity  0(  Nn2W)  for  a  probability  assignment  that  is  competitive  with 
the  best  assignment  provided  by  the  class  of  fc-rectangular  tilings  us¬ 
ing  rectangles  of  widths  less  than  W.  Another  interesting  class  of 
models  under  the  restriction  to  D  discrete  widths  is  the  class  of  k- 
rectangular  tilings  with  rectangles  whose  widths  are  powers  of  two. 
Restriction  of  the  probability  models  to  this  class  allows  us  to  have  an 
algorithm  with  a  computational  complexity  of  0{Nn2  logrc).  This 
class  is  similar  to  the  class  of  quadtrees  but  is  more  powerful  since 
only  one  dimension  is  restricted  to  the  log2  n  discrete  sizes  and  arbi¬ 
trary  shifts  are  allowed. 

Experiments  on  compressing  wavelet  transform  of  images  re¬ 
ported  elsewhere  [3]  show  that  the  method  is  practically  effective. 

IV.  Open  Problem 

The  method  described  in  this  abstract  is  a  sequential  probability- 
assignment  method.  We  do  not  know  how  to  obtain  efficient  two 
stage  coding  methods  with  good  bounds  on  the  redundancy  for  the 
class  of  k -rectangular  tilings  of  an  image.  Such  forward  adaptation 
methods  may  allow  the  use  of  sophisticated  quantization  methods  in 
conjunction  with  this  class  of  models. 
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Abstract  —  In  variable- length  coding,  the  probabil¬ 
ity  of  codeword  length  per  source  letter  being  above 
(resp.  below)  a  prescribed  threshold  is  called  the 
overflow  (resp.  the  underflow)  probability.  In  this 
study,  we  show  that  the  inflmum  achievable  threshold 
given  the  overflow  probability  exponent  r  always  coin¬ 
cides  with  the  inflmum  achievable  fixed-length  coding 
rate  given  the  error  exponent  r,  without  any  assump¬ 
tions  on  the  source.  In  the  case  of  underflow  prob¬ 
ability,  we  also  show  the  similar  results.  From  these 
results,  we  can  utilize  various  theorems  and  results 
on  the  fixed-length  coding  established  by  Han  for  the 
analysis  of  overflow  and  underflow  probabilities. 

I.  General  Sources 

Let  us  define  a  general  source  as  an  infinite  sequence  X  = 
{Xn  =  (X}"*,  •  •  •  of  n-dimensional  random  vari¬ 

ables  X n  where  each  component  random  variable  X-n)  (1  < 
i  <  n)  takes  values  in  a  countably  infinite  set  X  which  is  called 
the  source  alphabet.  It  should  be  noted  here  that  each  com¬ 
ponent  of  Xn  may  change  depending  on  block  length  n.  This 
implies  that  the  sequence  X  is  quite  general  in  the  sense  that 
it  may  not  satisfy  even  the  consistency  condition  as  usual  pro¬ 
cesses.  The  class  of  sources  thus  defined  covers  a  very  wide 
range  of  sources  including  all  nonstationary  and/or  nonergodic 
sources. 


III.  Main  Results 

Definition  1  :  R  is  called  an  r-achievable  overflow  threshold 
if  there  exists  a  prefix  variable-length  encoder  such  that 


lim  inf  —  log  — - - — 

n-»oo  n  e„(i Pn,R) 


>  r. 


Moreover,  we  define  the  inflmum  r-achievable  overflow  thresh¬ 
old  by 


Le(r|X)  =  inf{R  |  R  is  an  r-achievable  overflow  threshold}. 

Theorem  1  :  For  any  general  source  X  with  countably  infi¬ 
nite  alphabet  X  and  all  r  >  0,  we  have 


Mr|X)  =  Re(r|X), 

where  Rc(r|X)  is  the  inflmum  r-achievable  fixed-length  cod¬ 
ing  rate  [2],  and  it  has  been  shown  by  Han  [2]  that 
Re(r\X)  =  supfl>0  (R  —  o(R)  \  o(R)  <  r}  ,  where  a (R)  = 
liminfn^oo  L  log 


r{  ~  log 


X •>  <-*")  = 


T 


Definition  2  :  R  is  called  an  r-achievable  underflow  thresh¬ 
old  if  there  exists  a  prefix  variable-length  encoder  ip'f,  such 
that 

lim  sup  -  log  —  v  <  r. 
n—yoo  U  R) 

Moreover,  we  define  the  inflmum  r-achievable  underflow 
threshold  by 


II.  Overflow  and  Underflow  Probabilities 

Let  ipyn  :  Xn  -t  U'|  rp„  :  U'  — >  Xn  be  a  prefix  variable- 
length  encoder  (a  one-to-one  mapping)  and  the  decoder  (the 
inverse  mapping  of  the  encoder),  respectively,  where  U  = 
(1, 2,  •  ■  • ,  K}  is  called  the  code  alphabet  and  U ’  is  the  set  of 
all  (non-null)  finite-length  strings  from  U.  Then,  let  us  define 
the  overflow  probability  of  the  prefix  variable-length  encoder 
<pn  with  threshold  R  by 

en(^,R)  =  Pr{il^(X"))  >  r}  , 

where  l{ u)  denotes  the  length  of  u  6  W.  We  also  define  the 
underflow  probability  of  the  prefix  variable-length  encoder  i pyn 
with  threshold  R  by 

£;(^,R)=Pr{i/(^(X"))  <r}. 

For  unifilar  finite-state  sources,  Merhav  [1]  has  shown  that 
the  optimal  exponential  decay  rate  of  the  overflow  probability 
is  equal  to  the  optimal  error  exponent  for  fixed  length  coding, 
and  this  optimal  decay  rate  can  be  universally  achieved  by 
using  Lempel-Ziv  code. 

10.  Uchida  is  now  with  the  Dept,  of  Network  Engineering,  Kana- 
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L'  (r|X)  =  inf{R  |  R  is  an  r-achievable  underflow  threshold}. 

Theorem  2  :  For  any  general  source  X  with  countably  infi¬ 
nite  alphabet  X  and  all  r  >  0,  we  have 

Le*(r|X)  =  R;(r|X), 

where  R*  (r|X)  is  the  infimum  r-achievable  fixed-length  coding 
rate  [2],  and  it  has  been  shown  by  Han  [2]  that  R*(r|X)  = 
inf{h  >  0  |inffi>o  {cr*(R)  +  [R  —  o'{R)  —  h]+}  <  r}  ,  where 
a*(R)  =  lim„^  L  log  ^  ;  — y,  and  [x]+  = 


Remark  :  In  [2],  Han  has  shown  examples  of  the  computa¬ 
tion  for  Re(r|X)  and  R*(r|X)  for  many  kinds  of  sources  X. 
These  examples  include  countably  infinite  alphabet  cases  that 
can  not  be  treated  by  the  traditional  method  of  types.  From 
Theorems  1  and  2,  we  can  use  all  of  these  results  to  derive  the 
values  of  Le(r|X)  and  Le*(r|X). 
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Abstract  —  We  consider  the  problem  of  finding  the 
quantizer  Q  that  quantizes  the  K-dimensional  causal 
context  Ci  =  (Xi-ti,Xi-t2,. .  .,Xi-tK)  of  a  source  sym¬ 
bol  Xi  into  one  of  M  conditioning  states  such  that  the 
conditional  entropy  H(Xi\Q(Ci))  is  minimized.  The 
resulting  minimum  conditional  entropy  context  quan¬ 
tizer  can  be  used  for  sequential  coding  of  the  sequence 
X0,Xi,X3,.... 

A  key  problem  in  sequential  source  coding  of  a  discrete 
random  sequence  Xo,Xi,  X2,  •  •  •  is  modeling  the  underlying 
conditional  distribution  of  the  source  P{Xi\Xt~1),  Because 
of  model  estimation  considerations,  it  is  not  possible  to  di¬ 
rectly  use  all  of  X‘~l  as  the  model’s  context.  Many  practical 
source  coders  choose  a  priori  a  model  with  fixed  complexity, 
based  on  domain  knowledge  such  as  correlation  structure  and 
typical  data  length,  and  estimate  only  the  model  parameters. 
To  avoid  context  dilution  problem,  we  quantize  the  modeling 
context  into  a  relatively  small  number  of  conditioning  states, 
and  estimate  P(Xi\Q(Ci))  instead,  where  Q  is  a  context  quan¬ 
tizer.  This  approach  has  produced  some  of  the  best  perform¬ 
ing  signal  compression  algorithms  such  as  CALIC  and  JPEG 
2000,  despite  the  fact  that  they  are  not  strictly  universal.  A 
pivotal  issue  for  these  source  coders,  which  impacts  their  rate- 
distortion  performance,  is  the  design  of  the  context  quantizer 
Q.  The  problem  is  one  of  optimal  vector  quantization  design 
with  respect  to  the  Kullback-Leibler  distance. 

Let  Y  be  a  discrete  random  variable,  and  let  C  be  a  jointly 
distributed  random  vector,  possibly  real.  Given  a  positive  in¬ 
teger  M,  we  wish  to  find  the  quantizer  Q  :  C  — t  {1,  2, . . . ,  M } 
such  that  H{Y\Q(C))  is  minimized.  Clearly,  H(Y\Q(C))  > 
H(Y\C)  by  the  convexity  of  H.  However,  we  wish  to  make 
H(Y\Q(C))  as  close  to  H(Y\C)  as  possible.  Equivalently,  we 
wish  to  minimize  the  non-negative  “distortion”  of  Q 

D(Q)  =  H(Y\Q(C))  —  H(Y\C) 

=  /  dP(c)D(Py\C=c\\PY\Q(C)=Q(.C)),  (1) 

which  is  the  average,  over  all  context  vectors  c,  of  the 
Kullback-Leibler  distances  between  the  probability  mass 
functions  (pmfs)  Py|c('!c)  and  their  “reproduction”  pmfs 
Py\q(C)  ('l*3(c))- 

Let  0m(y )  =  PY\Q{C){y\m)  denote  the  mth  reproduction 
pmf.  Then  an  optimal  Q  must  map  almost  all  context  vectors 
c  to  the  conditioning  state  m  that  minimizes  the  Kullback- 
Leibler  distance  D(PY\c=c  II 0  to))  i-6.? 

Q(c)  =  argminD(Pyic=c||^m)-  (2) 

m 

The  quantization  regions  Am  —  {c  :  Q( c)  =  m},  m  = 
1  ,...,M,  of  a  minimum  conditional  entropy  context  quan¬ 
tizer  are  generally  quite  complex  in  shape,  and  may  not  even 
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be  convex  or  connected.  However,  their  associated  sets  of 
pmfs  Bm  —  {•Py|c(’|c)  :  c  6  Am}  are  simple  convex  sets  in 
the  probability  simplex  for  Y,  owing  to  the  above  necessary 
condition  for  optimal  Q.  Let  0m(y)  =  P{y\C  €  Am)  be  the 
conditional  distribution  of  Y  given  C  6  Am.  Then  by  (2),  for 
each  c  €  Bm,  the  Kullback-Leibler  distance  from  Py\c(y\c) 
to  0m{y)  must  be  less  than  (or  equal  to)  the  Kullback-Leibler 
distance  to  0m' ( 3/),  m'  m •  Hence 

(3) 

V  V 

for  all  m!  /  m.  In  other  words,  if  c  €  Bm,  then  P(y |c)  lies  in 
an  intersection  of  halfspaces. 

If  Y  is  a  binary  random  variable,  then  its  probability  sim¬ 
plex  is  one-dimensional.  In  this  case,  the  quantization  regions 
Bm  are  simple  intervals.  If  the  random  variable  Z  is  defined  as 
Py|c(l|C)  (the  posterior  probability  that  Y  =  1  as  a  function 
of  C),  then  the  conditional  entropy  H (Y\Q(C))  of  the  optimal 
context  quantizer  can  be  expressed 

K 

H(Y\Q{C))=J2P{Z  &[gm-l,qm)}H(Y\Ze[qm-l,qm)) 

TO  —  1 

(4) 

for  some  set  of  thresholds  Therefore,  the  optimal  con¬ 

text  quantizer  can  be  found  by  searching  over  { qm }.  This  is 
a  scalar  quantization  problem,  which  can  be  solved  exactly 
using  dynamic  programming,  regardless  of  the  dimensionality 
of  the  context  space.  Once  the  scalar  problem  is  solved,  the 
optimal  context  quantizer  cells  Am  are  given  by 

Am  =  {c  :  Py|c(l|c)  £  [(fm  — 1,  9m)}.  (5) 

In  particular,  the  boundaries  between  these  cells  are  deter¬ 
mined  by  those  vectors  c  for  which  the  posterior  probability 
Py | c ( 1 1 c )  is  a  constant:  For  example,  Py|c(l|c)  =  qm  for  c 
along  the  boundary  between  Am  and  Am+ 1.  Equivalently,  Am 
can  be  expressed  in  terms  of  the  likelihood  ratio 

t  (  \  =  Pc|y(c|l)  Py( 0)  Py|o(l|c) 

Pc,y(c|0)  Py(l)l  — Py|C(l|c)‘  W 

If  both  Pc|y (c|0)  and  Pc|y(c|l)  are  d-dimensional  Gaus- 
sians,  then  optimal  context  quantizer  cells  are  bounded  by 
d-dimensional  quadratic  surfaces. 

The  significance  of  this  research  is  in  that  it  offers  a  con¬ 
structive  means  of  designing  optimal  source  codes  for  mini¬ 
mum  code  length  via  high-order  context  modeling.  The  prob¬ 
lem  of  controlling  model  cost  in  high-order  context  modeling 
is  addressed  by  designing  optimal  context  quantizer,  which 
collapses  high-order  contexts  into  any  given  number  of  coding 
states  in  a  way  to  minimize  the  actual  code  length.  Once 
the  context  quantizer  Q  is  designed,  on-line  estimation  of 
P(- |Q(C'))  by  count  statistics  and  adaptive  entropy  coding 
can  be  done  very  efficiently,  much  faster  than  by  context  tree 
methods.  We  observe  that  our  techniques  often  outperform 
the  universal  source  codes  of  proven  optimality  by  appreciable 
margins  on  real  data  in  image,  video,  and  audio  compression. 
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Abstract  —  In  this  paper,  we  show  the  probability  of 
length  overflow  of  several  codes  by  using  the  variance 
and  the  asymptotic  normality  of  the  codelength. 

I.  Introduction 

Lossless  source  coding  schemes  are  examined  under  several 
criterions.  The  most  representative  criterion  is  redundancy. 
Recently,  Merhav[l]  proposed  the  probability  of  length  over¬ 
flow. 

In  this  paper  we  redefine  the  probability  of  length  overflow. 
We  consider  a  finite  alphabet  source  A  —  {i  :  0  <  i  <  k  —  1 }. 
Let  xn  =  X1X2X3  £  Xn  denotes  a  source  sequence.  And 

let  p(xn)  denotes  the  probability  distribution  of  a  source.  Let 
L(-)  be  a  codelength  and  en  be  a  function  of  n. 

Definition  I.  1  The  probability  of  length  overflow  is  defined 
by 

Pr{L(xr')>en}.  (1) 

We  shall  evaluate  a  code  by  using  the  probability  of  length 
overflow  instead  of  the  expected  codelength. 

Next  we  define  the  two  quantities,  that  have  very  important 
role  in  this  paper.  First  we  generalize  the  minimal  coding 
variance,  which  is  inherent  value  of  a  source,  proposed  by 
Kontoyiannis[2]. 

Definition  I.  2  The  rth  moment  of  self-information  is  de¬ 
fined  by 

M(X)T=  lim  E  [{-ilogp(I'*)-£[--logp(A",)])rl  . 

n—00  L  1  n  n  J  1 

Especially,  the  2nd  moment  of  self-information  coincides  with 
the  minimal  coding  variance. 


When  a  source  distribution  is  known,  it  is  well  known  that 
a  codelength  —  logplz’1)  minimize  the  expected  codelength. 
We  call  this  code  Shannon  code  and  let  0%  be  the  variance  of 
Shannon  code.  Obviously,  the  variance  of  Shannon  codelengt  h 
coincides  with  2nd  moment  of  self-information.  Here  we  define 
a  condition  of  a  source  as  follows. 

Condition  II.  1  The  codelength  of  Shannon  code  with  re¬ 
spect  to  a  source  satisfies  the  asymptotic  normality. 

Then  we  have  the  following  lemma. 

Lemma  II.  2  Under  Conditionll.  1  ,  if  lim  <n  >  nH(X)  + 
\JnM (A')2,  then  we  have 

lim  Pr{—  logp(xn)  >  fn}  =  0.  (4) 


III.  The  probability  of  length  overflow  of 
Bayes  code 

We  consider  a  parameterized  source  distribution.  Let  8  €  0 
is  a  A>dimensional  parameter  of  a  source.  If  8  is  unknown, 
it  is  known  that  Bayes  code  minimize  the  redundancy  with 
respect  to  Bayes  criterion.  The  coding  probability  of  Bayes 
code  is  given  by  m{xn)  —  /ege  p{xn\8)p{8)d8 ,  where  p(8)  is  a 
prior  distribution  of  8.  We  define  a  condition  of  a  source. 

Condition  III.  1  The  codelength  of  Bayes  code  with  respect 
to  a  source  satisfies  the  asymptotic  normality. 

Then  we  have  the  following  theorem. 

Theorem  III.  1  Let  the  variance  of  Bayes  code  denoted  by 
we  have 


Second  we  define  the  moment  of  codelength. 

Definition  I.  3  Let  Lc(xn)  denotes  the  codelength  for  se¬ 
quence  xn  when  we  use  a  code  c.  Then  the  rth  moment  of 
a  code  c  is  denoted  by 

oTc  =  lim  E  \{-Lc(xn)  -  £-[iic(A'")]}n  .  (2) 

n— 00  Lin  n  J  J 


From  above  theorem,  we  have  the  following  lemma. 

Lemma  III.  1  Under  Conditional.  1  ,  if  lim,,— 00  fn  > 
nH(X)  +  \JnM (A  )2 ,  then  we  have 


Especially,  when  r  —  2  we  call  this  the  variance  of  codelength  lim  Pr{—  log  m(xn)  >  fn}  =  0.  (6) 

of  a  code  c.  n— co 


II.  The  probability  of  length  overflow 

We  show  the  probability  of  length  overflow  of  a  code  c.  Let 
Lc(xn)  denote  the  codelength  of  a  code  c  for  xn . 


Lemma  II.  1  If  the  codelength  of  a  code  c  satisfies  asymp¬ 
totic  normality  with  respect  to  a  source,  the  probability  of 
length  overflow  of  a  code  c  is  given  by 


lim  Pr{Lc(xn)  >  en} 


(3) 


where,  Z* 


poo 

1 

-Vl 

L. 

-7=  exp 
V2tt 

2 

Llll 

1 

<t?  is  the 

variar 

dy, 
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IV.  CONSIDERATION 

We  obtained  the  probability  of  length  overflow  of  codes,  that 
minimize  the  expected  codelength.  From  above  lemmas  nei¬ 
ther  source  distribution  is  known  or  unknown,  under  Condi- 
tionll.  1  ,111.  1  ,  if  we  wish  the  probability  of  length  overflow 
goes  to  0  then  it  is  necessary  that  ,limJ1_00  c„  >  nH(X)  + 

V”M(xy. 

We  introduce  the  moment  of  self-information  and  the  mo¬ 
ment  of  codelength,  that  play  very  important  role  to  analyse 
the  probability  of  length  overflow. 
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Abstract  —  Watermarking  (WM)  codes  are  analyzed 
from  an  information— theoretic  viewpoint  as  identifi¬ 
cation  (ID)  codes  with  side  information  that  is  avail¬ 
able  either  at  both  transmitter  and  receiver  or  at  the 
transmitter  only.  For  the  former  case,  formulas  are 
provided  for  the  ID  capacity  and  for  achievable  er¬ 
ror  exponents.  For  the  latter  case,  upper  and  lower 
bounds  to  the  ID  capacity  are  derived. 

WM  techniques  are  about  embedding  a  message  into  a 
covertext  dataset  (say,  an  image)  such  that  on  the  one  hand, 
quality  is  maintained,  and  on  the  other  hand,  this  message 
cannot  be  removed  without  access  to  some  secret  key  or  with¬ 
out  rendering  the  data  useless.  The  main  application  is  for 
proving  ownership  of  the  data  and  for  protection  against  forg¬ 
ers. 

In  contrast  to  the  vast  amount  of  research  work  reported  in 
the  signal/image  processing  literature,  relatively  little  atten¬ 
tion  has  been  devoted  to  this  problem  from  the  information- 
theoretic  perspective.  A  few  exceptions  are,  e.g.,  [2], [3], [5], [6], 
where  attempts  were  made  to  characterize  capacity  and/or 
error  exponents  of  WM  systems  by  viewing  them  as  coded 
communication  systems,  where  the  covertext  data  plays  the 
role  of  side  information  available  at  the  encoder  only  or  at 
both  ends  (depending  on  the  application). 

More  precisely,  consider  the  following  system:  A  rate-  R 
block  code  of  length  n,  fed  by  an  (nR)- bit  message  m,  and  a 
n-block  of  a  memory  less  covertext  source  V,  generates  an  n- 
block  of  the  watermarked  version  X,  within  small  degradation 
of  quality,  symbolized  by  distortion  Ed(V,  X)  <  D\ .  An  active 
attacker,  modeled  as  a  memoryless  channel  W  :  X  -t  Y  may 
introduce  additional  distortion  Ed(X ,  Y)  <  D2  in  attempt  to 
disrupt  the  watermark.  Finally,  Y  is  decoded  at  the  receiving 
end,  with  or  without  access  to  the  covertext  V,  in  order  to 
extract  the  watermark. 

In  all  the  above-mentioned  papers,  WM  systems  were 
viewed  as  ordinary  communication  systems,  where  the  decoder 
carries  out  full  decoding,  i.e.,  decides  which  one  of  2nR  pos¬ 
sible  messages  was  embedded.  In  most  of  the  applications, 
however,  full  decoding  is  not  really  necessary,  as  one  needs 
only  to  detect  whether  or  not  a  particular  watermark  resides 
in  the  covertext.  Performance,  in  this  case,  is  measured  by 
the  tradeoffs  between  rate,  false-alarm  probability  and  mis- 
detection  probability.  This  observation  guides  us  to  view  WM 
codes  as  ID  codes  [1]  rather  than  ordinary  transmission  codes. 

Since  in  the  ID  setting,  both  false-alarm  and  misdetection 
probabilities  (of  each  individual  message)  can  be  kept  arbi¬ 
trarily  small  for  large  n  even  for  a  doubly  exponential  number 
of  messages  (when  randomized  encoders  are  allowed),  the  ID 
WM  capacity  is  defined  as  limsup  of  the  normalized  iterated 
logarithm  of  the  maximum  achievable  number  of  messages  de¬ 
fined  by  an  encoder  that  satisfies  the  distortion  constraint. 

Our  main  results  are  as  follows  (for  proofs,  see  [7]): 


Theorem  1  For  a  discrete  memoryless  covertext  source  V, 
available  at  both  transmitter  and  receiver,  and  a  given  DMC 
W,  the  ID  WM  capacity  C i  is  given  by 

C1=H(V)+ sup  I(X;Y\V),  (1) 

where  the  supremum  is  over  all  triples  ( V ,  X,  Y)  dis¬ 
tributed  according  to  P(V,X,Y)  =  P(V)P(X\V)W{Y\X) 
with  Ed(V,X)  <  Di. 

Theorem  2  For  a  discrete  memoryless  covertext  source  V, 
available  at  the  transmitter  only,  and  a  given  DMC  W ,  the 
ID  WM  capacity  C2  is  bounded  by 

sup  1(17; Y)  <  C2  <  sup 7(t/;Y),  (2) 

B  A 

where  A  is  the  set  of  all  quadruples  (U,V,X,Y)  distributed 
according  to  P{U,V,X,Y)  =  P(V)P(X,U\V)W(Y\X)  with 
Ed(V,X)  <  D\ ,  and  B  is  the  same  as  A  but  with  the  addi¬ 
tional  constraint  that  I(U,V)  <  I(U;Y). 

Two  comments:  (i)  The  direct  part  of  Theorem  1  includes 
a  more  refined  analysis  (see  [7])  that  characterizes  a  set  of 
achievable  triples  (R,  Ei,E2),  where  E\  and  E2  are  exponen¬ 
tial  rates  of  the  error  probabilities  of  the  two  kinds.  As  E\ 
and  E2  tend  to  zero,  the  maximum  achievable  rate  is  R  =  Ci. 
(ii)  It  is  known  that  in  ID  problems,  if  both  transmitter  and 
receiver  have  access  to  a  common  information  source  (com¬ 
mon  experiment)  Z,  then  the  ID  capacity  is  increased  by  the 
entropy  of  Z.  In  Theorem  1,  obviously  Z  =  V .  In  Theorem  2, 
the  receiver  can  partially  guess  V  with  a  common  information 
rate  of  I{U\  V),  which  when  added  to  7(17;  Y)  —  I(U ;  V)  (cor¬ 
responding  to  the  transmission  capacity  with  side  information 
at  the  transmitter  only  [4]),  gives  7(17;  Y).  Accordingly,  the 
additional  constraint  of  set  B  in  Theorem  2  means  that  the 
transmission  capacity  is  positive. 
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Abstract  —  We  consider  the  problem  of  embedding 
one  signal  ( e.g .,  a  digital  watermark),  within  another 
“host”  signal  to  form  a  third,  “composite”  signal.  The 
goal  is  to  achieve  efficient  rate-distortion-robustness 
trade-offs.  We  introduce  a  new  class  of  embedding 
methods  called  distortion-compensated  quantization 
index  modulation.  In  several  different  contexts  in¬ 
volving  both  intentional  and  unintentional  attacks, 
capacity-achieving  methods  exist  within  this  class, 
while  in  other  contexts  these  methods  achieve  prov¬ 
ably  better  rate-distortion-robustness  performance 
than  previously  proposed  spread-spectrum  and  gen¬ 
eralized  low-bit(s)  modulation  methods. 

I.  Introduction 

Digital  watermarking  and  information  embedding  systems  em¬ 
bed  information  in  a  host  signal,  which  is  typically  an  image, 
audio  signal,  or  video  signal.  The  host  signal  is  not  degraded 
unacceptably  in  the  process,  and  one  can  recover  the  water¬ 
mark  even  if  the  composite  host  and  watermark  signal  undergo 
a  variety  of  attacks  as  long  as  these  corruptions  do  not  unac¬ 
ceptably  degrade  the  host  signal.  These  systems  play  an  im¬ 
portant  role  at  least  three  major  application  areas:  (1)  copy¬ 
right  protection  of  multimedia  content,  (2)  authentication  and 
tamper-detection,  and  (3)  backwards-compatible  upgrading  of 
existing  legacy  communication  networks  [1]. 

II.  Problem  Model 

We  wish  to  embed  a  message  m  6  {l,  2, . . . ,  2NRm  } ,  some¬ 
times  called  a  digital  watermark,  in  some  host  signal  vector 
x  €  $lN ,  where  Rm  is  the  embedding  rate  in  bits  per  host 
signed  sample.  Specifically,  m  and  x  are  mapped  onto  a  com¬ 
posite  signal  vector  s  £  dtN  using  some  embedding  function 
s(x,  m),  and  we  define  a  distortion  measure  between  x  and  s. 
Equivalently,  we  can  define  a  host-dependent  distortion  sig¬ 
nal  e(x,  m )  that  is  added  to  x  to  obtain  s.  The  composite 
signal  s  is  subjected  to  unintentional  attacks  and  possibly  to 
intentional  attacks  inside  some  channel,  which  produces  an 
output  vector  y  €  5RW.  A  decoder  generates  an  estimate  m  of 
m  after  observing  y,  i.e.,  we  consider  the  “host-blind”  case, 
where  x  is  not  available  to  the  decoder.  Ideally,  the  decoder 
can  reliably  recover  the  embedded  information  as  long  as  the 
channel  degradations  are  not  too  severe.  Thus,  the  tolerable 
severity  of  the  degradations  is  a  measure  of  the  robustness  of 
the  system.  The  goodness  of  s(x,  m)  and  its  corresponding  de¬ 
coder  is  measured  by  the  achievable  rate-distortion-robustness 
trade-offs. 
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National  Defense  Science  and  Engineering  Graduate  Fellowship. 
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Fig.  1:  Quantization  index  modulation  information  embedding. 

III.  Distortion-Compensated  Quantization 
Index  Modulation 

Quantization  index  modulation  (QIM)  embedding  functions 
arise  by  defining  an  ensemble  of  quantizers  q(-;  m),  one  quan¬ 
tizer  in  the  ensemble  for  each  possible  value  of  m.  Then, 
s(x,  m)  —  q(x;  m).  An  example  is  shown  in  Fig.  1  for  the 
case  where  N  =  1,  Rm  =  1,  and  the  quantizers  are  uniform, 
scalar  quantizers.  One  can  decode,  for  example,  by  determin¬ 
ing  whether  y  is  closer  to  a  o  point  (m  =  1)  or  to  a  x  point 
( m  =  2).  Thus,  the  x  and  o  points  represent  both  source  code¬ 
words  for  representing  x  and  channel  codewords  for  communi¬ 
cating  m.  QIM  systems  reject  interference  from  the  host  signal 
since  x  determines  which  o  or  x  point  is  chosen  but  does  not 
deflect  s  or  y  away  from  these  points.  Distortion-compensated 
QIM  (DC-QIM)  systems  add  back  some  fraction  1  —  a  of  the 
quantization  error,  s(x,  m)  —  q(x;  m)  +  (1  —  a)[x  —  q(x;  m)], 
which  can  be  shown  [1]  to  improve  rate-distortion-robustness 
performance  with  the  proper  choice  of  a. 

IV.  Performance  Against  Attacks 

In  fact,  one  can  derive  sufficient  conditions  under  which 
capacity-achieving  DC-QIM  systems  exist  [1],  These  condi¬ 
tions  are  satisfied  in  at  least  three  cases:  (1)  the  additive  Gaus¬ 
sian  noise  channel  and  Gaussian  host  signal  scenario  of  [2], 
(2)  the  case  of  squared  error  distortion-constrained  attacks 
and  a  Gaussian  host  signal  described  in  [3],  and  (3)  the  case  of 
squared  error  distortion-constrained  attacks,  a  non-Gaussian 
host  signal,  asymptotically  small  embedding-induced  distor¬ 
tion,  and  asymptotically  small  attacker’s  distortion  described 

in  [31- 

In  a  number  of  other  contexts  where  the  capacity  is  un¬ 
known,  DC-QIM  methods  achieve  provably  better  perfor¬ 
mance  than  previously  proposed  additive  spread-spectrum 
methods,  which  do  not  reject  interference  from  the  host  signal, 
and  generalized  low-bit(s)  modulation  methods.  These  cases 
are  discussed  in  [1],  along  with  practical  implementations  of 
DC-QIM  and  QIM  systems. 
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Abstract  —  We  consider  a  watermarking  system 
where  2nRw  distinct  Gaussian  watermarks  are  em¬ 
bedded  in  respective  copies  of  an  n- dimensional  i.i.d. 
Gaussian  image.  Copies  are  distributed  to  customers 
in  digital  form,  using  Rq  bits  per  image  dimension. 
We  establish  the  rate  region  for  the  pair  {Rq,  Rw)  such 
that  (i)  the  average  quadratic  distortion  between  the 
original  image  and  each  distributed  copy  is  no  more 
than  a  specified  level;  and  (ii)  the  error  probability  in 
decoding  the  embedded  watermark  in  the  distributed 
copy  approaches  zero  asymptotically  in  n. 

I.  Problem  Formulation 

Recently,  there  have  been  some  information-theoretic  ap¬ 
proaches  to  the  analysis  of  watermarking  systems.  Of  particu¬ 
lar  interest  is  [1],  which  gives  a  general  expression  for  the  max¬ 
imum  rate  of  the  set  of  messages  that  can  be  hidden  within  a 
host  data  set  subject  to  a  distortion  constraint,  as  well  as  the 
requirement  that  the  message  withstand  a  deliberate  attack 
aimed  to  destroy  it. 

In  this  paper,  we  study  a  related  problem  that  combines 
source  and  channel  coding  in  a  watermarking  framework.  This 
problem  is  motivated  by  the  following  scenario.  A  data  dis¬ 
tributor  (e.g.,  a  news  agency)  has  to  deliver  an  information 
sequence  In  (e.g.,  a  digital  image)  to  Mn  —  2nRw  customers, 
such  that  each  customer  receives  a  different  watermarked  ver¬ 
sion  of  In.  To  that  end,  the  agent  creates  Mn  watermarks 
Xn(l Xn{M„)  independently  of  In,  and  uses  them  to 
generate  the  watermarked  copies  Yn(k)  —  In  +  Xn(k),  k  — 
1  ,...,Mn.  Due  to  bandwidth  limitations,  the  agent  com¬ 
presses  the  watermarked  data  at  a  rate  of  Rq  bits  per  image 
dimension  subject  to  a  fidelity  criterion  prior  to  distribution. 

For  security  purposes  as  well  as  for  maximum  usability,  we 
assume  that  both  the  quantization  and  the  reconstruction  of 
the  image  are  independent  of  the  choice  of  the  watermark  set. 
In  addition,  the  agent  who  generated  the  image  should  be  able 
to  discern  which  watermark  is  present  in  a  digital  image  with 
a  low  probability  of  error  Pe  (e.g.,  in  case  an  authenticator 
needs  to  track  down  the  initial  owner  of  an  illegally  distributed 
image).  Therefore,  watermarks  and  source  codewords  have  to 
be  designed  in  such  a  way  that  knowledge  of  the  watermark 
set  and  the  original  data  is  enough  for  detecting  reliably  the 
watermark  in  a  compressed,  watermarked  image. 

The  main  result  of  this  paper  is  the  determination  of 
the  allowable  rates  Rq  and  Rw  for  the  above  system,  un¬ 
der  the  following  assumptions:  (i)  In  is  i.i.d.  Af{0,Pi), 
(ii)  the  watermarks  X71  {!),...  ,Xn{Mn)  are  generated  i.i.d. 
A/"(0, Px)  with  Px  <  Pi,  and  (iii)  the  distortion  constraint 
n~1E[\\In  —  Un||2]  <  D  is  met  (Yn  is  the  quantized  version  of 


Adrian  Papamarcou 
Department  of  Electrical  and 
Computer  Engineering 
University  of  Maryland 
College  Park,  MD  20742,  USA 
e-mail:  adrianQeng.umd.edu 


Fig.  1:  For  any  distortion  constraint  D,  the  shaded  area  represents 
the  region  7 Zd  of  achievable  pairs  ( Rq,Rw )•  As  D  varies,  the  min-- 
imum  source  coding  rate  rg{D)  and  the  maximum  corresponding 
watermarking  rate  rw(rq(D),D)  parametrically  define  curve  C. 

Yn).  Unlike  the  case  in  [1],  here  we  consider  a  single  fidelity 
criterion,  namely  the  resultant  distortion  between  the  original 
data  sequence  and  the  watermarked/quantized  data.  Also, 
while  quantization  degrades  the  original  image,  it  cannot  be 
construed  as  a  malicious  attack  of  the  type  modeled  in  [1].  In 
our  case,  data  compression  and  watermarking  are  cooperative 
(not  competing)  schemes,  and  must  be  optimized  jointly. 

II.  Results 

The  coding  theorem  that  establishes  the  bounds  on  Rq  and 
Rw  consists  of  two  parts.  The  forward  theorem  demonstrates 
the  existence  of  a  source  code  for  Yn  and  an  i.i.d.  Gaussian 
random  code  for  the  watermark  set  such  that  the  distortion 
constraint  is  satisfied  and  the  probability  of  error  Pe  is  ar¬ 
bitrarily  small,  as  long  as  {Rq,Rw)  belongs  to  some  region 
7 Zd-  The  converse  theorem  shows  that  if  an  arbitrary  source 
code  and  an  i.i.d.  Gaussian  watermark  code  jointly  satisfy 
the  distortion  constraint  and  yield  an  asymptotically  vanish¬ 
ing  Pe,  then  ( Rq,Rw )  must  lie  in  7 Zd-  We  proved  that  7 Zd 
is  characterized  as  follows: 

Rq  >  rq{D)  =  2  l°S  {{Pj  +  PX)D  -  PiPx) 

Rw  <  rw {Rq,  D)  =  Rq  -  ^  log  (  £f) 

where  pj+p-  <  D  <  Pi  (all  distortion  values  of  interest).  The 
graphical  representation  of  these  results  is  given  in  Figure  1. 
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Abstract  —  We  compute  the  value  of  the  water¬ 
marking  game  for  a  Gaussian  covertext  and  squared- 
error  distortions.  Both  the  public  version  of  the  game 
(covertext  known  to  neither  attacker  nor  decoder)  and 
the  private  version  of  the  game  (covertext  unknown 
to  attacker  but  known  to  decoder)  are  treated.  Sur¬ 
prisingly,  the  two  versions  yield  identical  values. 

I.  Introduction 

The  watermarking  game  [1,  2]  can  model  a  situation 
where  an  original  source  sequence  (“covertext”)  needs  to  be 
copyright-protected  before  it  is  distributed  to  the  public. 
The  copyright  ( “message” )  needs  to  be  embedded  in  the  dis¬ 
tributed  version  ( “stegotext” )  so  that  no  “attacker”  with  ac¬ 
cess  to  the  stegotext  will  be  able  produce  a  “forgery”  that  re¬ 
sembles  the  covertext  and  yet  does  not  contain  the  embedded 
copyright  message.  The  watermarking  process  (“encoding”) 
should,  of  course,  introduce  little  distortion  so  as  to  guaran¬ 
tee  that  the  stegotext  closely  resembles  the  original  covertext. 

Different  messages  may  correspond  to  different  possible 
owners,  versions,  dates,  etc.  of  the  covertext,  and  it  is  thus 
of  interest  to  study  the  number  of  distinct  messages  that  can 
be  embedded  if  reliable  decoding  is  required  from  any  rea¬ 
sonable  forgery.  The  highest  exponential  rate  at  which  this 
number  can  grow  in  relation  to  the  covertext  size  is  the  cod¬ 
ing  value  of  the  game.  A  precise  statement  of  this  problem 
and  some  proofs  can  be  found  in  [3]. 

II.  Watermarking  model 

The  watermarking  game  can  be  described  as  follows.  A 
source  emits  the  zero-mean  vajiance-crJ  IID  length-n  covertext 
sequence  U.  Independently  of  U,  a  copyright  message  W  is 
drawn  uniformly  over  the  set  Wn  =  {1, . . . ,  |_2n/?J },  where  R 
is  the  rate  of  the  system. 

Using  a  secret  key  0i,  which  is  independent  of  U  and  W, 
the  encoder  produces  the  stegotext  X  =  X(U,  W,  0i)  £  Rn . 
We  require  the  encoder  to  satisfy  ^||X— U||2  <  D i,  a.s.,  where 
D\  >  0  is  a  given  constant  called  the  encoder  distortion  level , 
and  a.s.  stands  for  “almost  surely” . 

The  attacker,  which  is  assumed  to  be  ignorant  of  U  and 
0i,  produces  a  forgery  Y  =  Y(X,  02)  €  Rn  based  on  X 
and  its  own  attack  key  02.  We  similarly  require  the  attacker 
to  satisfy  i||Y  -  X|[2  <  D2,  a.s.,  where  D2  >  0  is  a  given 
constant  called  the  attacker  distortion  level. 

The  decoder  produces  an  estimate  of  the  message  W.  In 
the  public  version  of  the  game,  the  decoder  only  uses  the  en¬ 
coder’s  secret  key  and  the  forgery,  so  that  W  =  W(Y,0i). 

^his  research  was  supported  in  part  by  a  NSF  Graduate  Fellow¬ 
ship  (A.  Cohen)  and  by  the  NSF  Faculty  Early  Career  Development 
(CAREER)  Program  (A.  Lapidoth)  at  MIT.  It  was  conducted  in 
part  at  the  Institute  for  Signal  and  Information  Processing,  ETH. 


In  the  private  version  of  the  game,  the  decoder  also  uses  the 
covertext,  so  that  W  =  W{Y,  0i,U).  We  consider  the  prob¬ 
ability  of  error  averaged  over  the  covertext,  message  and  both 
sources  of  randomness,  which  is  written  Pe(n)  =  Pr(W  /  IV). 

We  adopt  a  conservative  approach  to  the  watermarking 
game  and  assume  that  once  the  watermarking  system  is  em¬ 
ployed,  its  details  are  made  available  to  the  attacker.  The 
attacker  can  thus  optimize  for  the  encoder  and  decoder.  This 
precludes  the  decoder  from  using  the  maximum-likelihood  de¬ 
coding  rule.  We  thus  say  that  rate  R  is  achievable  if  there  ex¬ 
ists  a  sequence  of  allowable  rate-17  encoder  and  decoder  pairs 
such  that  for  any  sequence  of  allowable  attackers,  Pe  ( n )  tends 
to  zero  as  n  tends  to  infinity. 

The  value  of  the  game  is  called  the  coding  capacity,  and 
it  is  the  supremum  of  all  achievable  rates.  We  write  the  cod¬ 
ing  capacity  as  CpTiv{DuD2,al)  and  CpUb(F)i,  D2,al)  for  the 
private  and  public  versions  of  the  game,  respectively. 

Theorem  1.  For  the  Gaussian  watermarking  game, 

G pub{Rt i ,  D2,  CTU)  =  C’priv(Di ,  D2,  (Tu). 

If  the  interval 

A{D\,D2,al)  -  [max  |d2,  (ctu  -  ,  (c r„  +  n/Di)2]  , 


is  empty,  then  Cplw(Di,D2,al)  is  zero.  Otherwise, 


CpTw(DuD2,a2u)=  max 

AeA(D1,D2,<?2) 


If  expected  rather  than  a.s.  distortion  constraints  are  used, 
then  the  coding  capacity  for  both  versions  is  zero. 


Note  that  the  optimal  A  is  a  root  of  a  cubic  equation  and 
hence  a  closed  form  solution  for  the  capacity  exists.  Differ¬ 
ent  capacity  results  for  yet  another  version  of  this  game  with 
expected  distortion  constraints  and  a  decoder  that  knows  the 
attack  strategy  (ML  decoder)  have  been  recently  reported  in 
[11- 
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Abstract  —  A  metering  scheme  is  a  method  to  count 
the  number  of  clients  which  visit  each  server.  Naor 
and  Pinkas  [1]  presented  metering  schemes  which  al¬ 
low  to  identify  servers  which  are  visited  by  at  least 
a  certain  number  h  of  clients  and  is  secure  against 
attempts  by  servers  of  inflating  the  count  of  their  vis¬ 
its.  In  this  paper  we  consider  secure  metering  schemes 
for  ramp  access  structures.  We  provide  lower  bounds 
on  the  size  of  the  information  given  to  clients  and  to 
servers  and  present  a  scheme  achieving  these  bounds. 

I.  Introduction 

We  consider  a  scenario  where  there  are  n  clients,  m  servers 
and  an  audit  agency  A  whose  task  is  to  measure  the  interac¬ 
tion  between  the  n  clients  and  the  m  servers  in  order  to  count 
the  number  of  client  visits  that  any  server  receives.  Our  sce¬ 
nario  contemplates  the  existence  of  corrupt  servers  and  cor¬ 
rupt  clients  which  could  cooperate  in  order  to  inflate  the  count 
of  the  visits  that  a  corrupt  server  receives.  Naor  and  Pinkas 
[1]  proposed  metering  schemes  as  a  mean  to  prevent  servers 
from  inflating  the  count  of  their  visits.  In  their  schemes  any 
server  which  is  visited  by  a  number  of  clients  larger  than  or 
equal  to  some  threshold  h  provides  A  with  a  short  proof.  The 
metering  scheme  operates  for  at  most  r  time  frames  and  du¬ 
ring  these  time  frames  is  supposed  to  be  secure.  A  metering 
scheme  is  secure  at  a  certain  time  frame  t  if  any  server  visited 
by  less  than  h  clients  at  that  time  frame  has  no  information 
about  its  proof.  In  our  model  the  clients  receive  a  certain 
amount  of  information  from  the  audit  agency  and  give  part  of 
this  information  to  the  servers  when  visiting  them.  Given  the 
high  complexity  of  such  a  distribution  mechanism,  a  natural 
step  is  to  trade  complexity  for  security.  Hence,  we  consider 
a  more  flexible  situation  where  a  server  which  receives  less 
than  h  visits  is  able  t<5  gain  some  partial  information  about 
its  proof. 

II.  Metering  Schemes  for  Ramp  Structures 

An  (n,  m,  r,  c,  s)  metering  system  E  consists  of  n  clients 
Ci, . . .  ,Cn  and  m  servers  Si, . . .  ,Sm,  which  are  active  for  a 
number  r  of  time  frames  and  in  which  c  clients  and  s  servers 
can  be  corrupt.  A  corrupt  server  can  be  assisted  by  corrupt 
clients  and  other  corrupt  servers  in  order  to  inflate  the  count 
of  its  visits.  A  corrupt  client  can  donate  to  a  corrupt  server 
the  whole  information  it  has  received  from  A.  A  corrupt  server 
can  donate  to  another  corrupt  server  the  information  that  it 
has  so  far  received  from  clients.  A  ramp  structure  indicates  a 
pair  of  thresholds  {£,  h),  where  1  <c<£<h<n. 

For  i  —  1, . . . , n,  j  —  1 , ...  ,m,  t  =  1 , ...  ,t,  Ci  is  the  random 
variable  associated  with  the  information  given  by  A  to  Ci, 
C is  that  associated  with  the  information  given-  by  C;  to 
Sj  during  a  visit  at  time  frame  t ,  X*  {rl  j  is  that  associated 
with  the  information  received  by  Sj  at  time  frame  t  assuming 
it  is  visited  by  dj  clients  at  that  time  frame,  and  Pj  is  that 
associated  with  the  proof  generated  by  Sj  when  it  is  visited  by 


at  least  h  clients  during  time  frame  t  and  is  that  associated 
with  the  information  received  by  Sj  in  time  frames  1, . . . ,  t. 
Definition  II. 1  Let  E  be  an  (n,m,r,c,  s)  metering  system. 
An  (n,  m,  r,  c,  s)  metering  scheme  for  an  (£,  h )  ramp  structure 
is  a  distribution  protocol  of  the  proofs  for  the  m  servers  in  E 


in  such  a  way  that  the  following  properties  are  satisfied: 

1.  H(Clj\Ci)  =  0,  i  =  1,. ..  ,7i,  j  =  1,. . .  ,m,  t  =  1, .. .  ,r. 

2.  R(Pj|Xj-i(<ij))  =  0,  dj  >  h,  j  =  t  = 

3.  H(P\ , . . . ,  P^ \Ci  . . .  CcX*liWl) . . .  X^jV?-11 . . .  V'^11) 

=  dj  <  £  -  c,  j  =  l, ...  ,P,  t  =  1, ...  ,r. 

4.  H( Pi, . . . ,  P$|Ci . . .  CcXi  (di) . . .  X^jV?-11 . . .  Vl;-1]) 

=  W7  Tf]=Ah  -  (c  +  dj))H( P‘-|Pi  . .  .Pj-O,  where  X‘,(d.} 
is  associated  with  a  set  of  visits  to  Sj  from  dj  clients  other 


than  Ci, . . .  ,Cc,  £  <  dj  +  c  <  h,  j  =  1, . . . ,  (3  and  t  =  1, . . .  ,r. 


Lower  Bounds 

Theorem  II. 2  Let  E  be  an  {n,m,T,c,  s)  metering  system. 
Let  Si,...,Ss  denote  the  corrupt  servers.  In  any  metering 
scheme'  for  the  ramp  structure  {£,h)  for  E,  it  holds  that 
H( Ci)  >  ^  £;=1  H(Pj , . . . ,  P‘),  for  i  =  1, ...  ,71. 
Theorem  II. 3  Let  E  be  an  ( n,m,r,c,s )  metering  system. 
In  any  metering  scheme  for  the  ramp  structure  (£,  h)  for  E  it 
holds  that  H(C\j)  >  -~jH(Pj),  for  any  i  =  l,...,n,  j  — 
1, . . .  ,m,  and  t  =  1, . . .  ,r. 


A  Scheme  Achieving  our  Lower  Bounds 
Our  scheme  is  a  generalization  of  Shamir’s  scheme  [2]. 
Initialization:  The  audit  agency  A  chooses  h  —  £  polynomials 
P\(y), . . .  ,Ph-i{y)  over  GF(q),  where  q  is  a  prime  number 
larger  than  n+h—£.  For  r  =  1, . . . ,  h—l ,  Pr{y)  has  degree  st  — 
1.  Let  fi, . . . ,  fh-i  be  preselected  elements  of  GF{q)  distinct 
from  1,. . .  ,77.  Let  Q{x,y)  be  a  random  bivariate  polynomial 
over  GF(q)  of  degree  h—l  in  x  and  degree  st— 1  in  y,  such  that 
Q(fr,y)  —  Pr(y),  for  r  —  1 ,...  ,h  —  £  (It  is  easy  to  construct 
such  a  random  polynomial  by  using  Langrange  polynomials.). 
Hence,  A  sends  to  each  client  Ci  the  univariate  polynomial 
Q(i,y) ,  which  is  of  degree  st  —  1. 

Regular  Operation:  When  the  client  Ci  visits  the  server 
Sj  in  time  frame  t,  it  sends  to  Sj  the  value  Q{i,j  o  t).  The 
argument  jot  denotes  the  concatenation  of  j  and  t,  and  we 
assume  that  j  o  t  is  in  GF(q)  and  that  no  distinct  two  pairs 
(j,t)  and  (j1  ,t')  are  mapped  to  the  same  element. 

Proof  Generation:  If  the  server  Sj  has  been  visited  by 
at  least  h  different  clients  in  time  frame  t,  then  it  can  per¬ 
form  a  Lagrange  interpolation  and  reconstruct  the  polynomial 
Q(x,j  o  t).  Then,  it  computes  Q{fr,j  o  t)  for  r  —  1, . . . ,  h  —  £. 
The  resulting  ( h  —  f)-tuple  (Pi  (j  of),...,  Ph-e(j  °  t))  consti¬ 
tutes  the  proof  that  the  server  sends  to  the  audit  agency. 
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Abstract  —  We  give  a  conceptually  simple  proof  for 
the  capacity  of  the  exponential  server  queue.  Our 
proof  links  the  timing  channel  to  the  point-process 
channel  with  complete  feedback.  This  point-process 
approach  enables  us  to  bound  capacities  of  timing 
channels  that  arise  in  multiserver  queues,  queues  in 
tandem,  and  other  simple  configurations. 

The  capacity  of  the  exponential  server  queue  with  service 
rate  p  packets  per  second  is  e~lp  nats  per  second  [1],  The 
capacity  of  the  point-process  channel  with  maximum  input 
intensity  p  points  per  second,  and  no  background  intensity,  is 
also  e~1p  nats  per  second  (cf.[2],[3]).  Furthermore,  in  both 
channels,  the  capacity  does  not  increase  in  the  presence  of 
complete  feedback.  In  [1],  the  connection  between  both  chan¬ 
nels  in  the  presence  of  complete  feedback  was  discussed  briefly. 
In  [4],  this  connection  was  further  explored.  It  was  shown 
that  any  strategy  on  the  exponential  server  channel  can  be 
mapped  to  an  equivalent  strategy  that  uses  feedback  on  the 
point-process  channel.  This  observation  implies  that  the  ca¬ 
pacity  of  the  exponential  server  channel  is  upperbounded  by 
the  capacity  of  the  point-process  channel  with  complete  feed¬ 
back,  i.e.,  e~1p  nats  per  second. 

From  [1],  we  know  that  e~1p  nats  per  second  is  indeed 
achievable  on  the  exponential  server  queue.  In  other  words, 
although  the  exponential  server  queue  is  only  a  particular  case 
of  a  point-process  channel  with  feedback,  it  attains  the  point- 
process  channel  capacity.  In  this  paper,  we  provide  insight  on 
why  there  is  no  loss  in  capacity. 

To  see  the  connection  between  the  queue  and  the  point- 
process  channel,  fix  a  sequence  of  arrivals  denoted  by  the 
counting  process  x  =  (xt  :  t  €  [0,  T]).  Let  (Yt  :  t  €  [0,T]) 
be  the  corresponding  counting  process  of  departures  from  the 
single-server  queue  of  service  rate  p  packets  per  second.  Then 
the  state  process  ( Qt  =  xt  —  Yt  :  t  6  [0,  T])  indicates  the  num¬ 
ber  of  packets  in  the  queue  as  a  function  of  time.  Furthermore, 
the  departure  process  (Yt  :  t  €  [0,  T])  is  a  self-exciting  Pois¬ 
son  process  with  rate  A  =  (At  =  pi {Qt-  >  0}  :  t  €  (0,T]). 
Indeed,  if  Qt-  —  0,  no  packet  can  depart  at  time  t  (€  (0,  T]) 
and  the  instantaneous  rate  of  the  departure  process  is  0.  If 
Qt-  >  0,  at  least  one  packet  is  in  the  system  at  t— .  Due  to  the 
memoryless  property  of  exponential  service  times,  the  residual 
time  for  the  next  departure  is  exponentially  distributed  with 
mean  1  /p  seconds,  independent  of  the  past,  i.e.,  the  instanta¬ 
neous  rate  of  the  departure  process  is  p  at  time  t. 

It  is  well-known  that  the  sample  function  density  (which 
plays  the  role  of  probability  density)  given  input  x,  is  p(x,y), 
where 

p{x,y)  =  expi^J  [log (At)  dyt  -  Xt  d<]|  ■  (1) 

xThis  work  was  supported  in  part  by  the  National  Science  Foun¬ 
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Furthermore,  for  a  given  probability  measure  on  the  input 
space,  the  normalized  mutual  information  is 

±It(X-Y)=  Ie  f  dt  [*(At)-0(At)],  (2) 

J  o 

where  At  =  £[A,|(ra  :se  [0,t))],  for  each  t  €  [0,T],  and 
</>(«)  =  it  log  u,  (see  [2],  [3],  [5]).  We  take  4>(0)  =  0.  Note  that 
A t  is  an  estimate  of  the  rate  of  the  departure  process  given 
prior  departures. 

We  can  show  the  existence  of  codes  that  have  vanishing 
probability  of  error  (as  the  observation  interval  T  increases 
without  bound)  at  rate  e~1p  nats  per  second.  Here,  for 
brevity,  we  only  argue  that  there  is  an  input  probability  mea¬ 
sure  such  that  the  normalized  mutual  information  equals  the 
upperbound  e~lp  nats  per  second.  The  input  measure  should 
induce  the  following  properties  to  attain  the  upperbound. 

(а)  A(  =  0  or  p 

(б)  (1/T)/0T  dt  E[\t]  =  e-'p. 

(c)  At  should  be  independent  of  prior  departures 
(Us  :  s  6  [0,  i)),  and  U[A(]  should  be  a  constant  over 
time,  i.e.,  At  =  e-1  p. 

Let  the  input  probability  measure  be  a  Poisson  process  with 
rate  e-1  p  packets  per  second.  Let  the  queue  be  in  equilibrium 
at  t  =  0.  We  then  have  an  M/M/ 1  queueing  system.  Property 

(a)  holds  because  A(  is  p  times  an  indicator  function.  Property 

(b)  follows  from  ergodicity  of  the  state  process  and  the  fact 
that  the  queue  is  nonempty  with  probability  e_1 .  Property  (c) 
holds  by  Burke’s  theorem  (for  e.g.,  [5,  V.T1]);  the  state  of  the 
queue  Qt  is  independent  of  prior  departures  (Ys  :  s  €  [0,  t)) 
and  therefore  so  is  Af. 

The  point-process  approach  via  (1),  (2)  and  the  filtering 
techniques  of  [5]  (to  provide  estimates  of  queue  size)  can  be 
used  to  find  achievable  rates  of  some  simple  networks  of  ex¬ 
ponential  servers.  In  [6],  lower  bounds  on  the  capacities  of 
multiserver  queues  and  two  queues  connected  in  tandem  are 
provided. 
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Abstract  —  This  work  focuses  on  covert  timing  chan¬ 
nels,  in  which  information  is  conveyed  in  the  timing 
of  packets.  Jamming  strategies  and  coding  strategies 
are  developed  for  various  timing  channel  models. 

I.  Introduction 

Information  can  be  conveyed  covertly  using  the  timing  of 
packet  transmissions,  where  the  usage  is  covert  because  by 
design  and  common  usage,  information  in  packet  communica¬ 
tion  networks  is  conveyed  only  by  the  bits  within  the  pack¬ 
ets.  While  there  is  no  apparent  way  to  completely  eliminate 
covert  timing  channels  in  a  reliable  communications  system 
(e.g.  [1]),  a  delay  device  can  be  added  to  the  channel  to  jam 
covert  timing  communication.  With  an  appropriate  coding 
and  decoding  scheme,  a  timing  channel  coder  can  still  reliably 
communicate  in  the  presence  of  a  jammer.  For  various  chan¬ 
nel  models  and  delay  constraints  on  the  jammer,  the  game 
between  the  jammer  and  the  coder  is  explored. 

II.  Assumptions 

We  assume  that  the  mean  number  of  packets  per  unit  time 
transmitted  by  the  coder  is  constrained  such  that  for  a  large 
fixed  time,  T,  the  total  number  of  arrivals  is  at  most  XT  with 
probability  one.  We  take  T  — >  oo  and  write  I  for  mutual 
information  per  unit  time.  The  coder  is  aware  of  the  delay 
constraints  placed  on  the  jammer,  but  is  not  aware  of  the 
actual  strategy  employed  by  the  jammer.  We  assume  that  no 
feedback  is  given  to  the  coder. 

The  jammer  can  choose  any  delay  strategy,  including 
strategies  that  change  the  packet  ordering,  subject  to  con¬ 
straints  on  the  delay.  However,  the  jammer  cannot  insert  du¬ 
plicate  or  additional  packets  since  this  might  impact  the  un¬ 
derlying  packet  communication  system.  The  delay  constraints 
that  we  consider  for  jammers  include  a  Maximum-Delay-Less- 
than-D  (MDLD)  constraint,  an  Average-Delay-D  (ADD)  con¬ 
straint,  and  a  Maximum-Buffer-Size-B  (MBB)  constraint. 

III.  Channel  Models 

A  continuous  time  packet  model  and  a  discrete  time  packet 
model  are  considered. 

In  the  continuous  time  packet  model,  there  are  no  lower 
bounds  on  the  spacing  between  initiations  of  packet  transmis¬ 
sions  so  the  coder  or  the  jammer  can  send  multiple  packets  in 
a  single  instant.  The  only  restriction  on  the  continuous  time 

1  James  Giles  is  supported  by  a  Department  of  Defense  NDSEG 
Fellowship.  This  work  was  also  supported  by  the  National  Science 
Foundation  under  Grant  ANR-99-80544. 


packet  model  is  that  neither  the  coder  nor  the  jammer  can 
split  a  packet. 

In  the  discrete  time  packet  model,  time  is  slotted  and  both 
the  coder  and  the  jammer  can  transmit  zero  or  one  packets  in 
each  time  slot.  The  discrete  time  packet  model  is  a  tractable 
way  to  introduce  a  lower  bound  on  the  interpacket  spacing. 

Two  more  models  are  introduced  to  facilitate  analysis. 
These  models  have  fluid  flows  rather  than  packet  streams. 

IV.  Results 

We  look  for  jamming  strategies,  Q,  that  satisfy 
ma xxI(X,Q)  =  minq  maxx  I{X,  Q),  and  coding  strategies 
X,  that  satisfy  minq  I{X,Q)  =  maxx  minq  I(X.  Q)  where 
I(X,Q)  represents  the  mutual  information  per  unit  time  be¬ 
tween  X  and  the  output  of  jammer  Q  when  X  is  the  input. 

For  the  set  of  MDLD  jammers  in  the  continuous  time  packet 
model,  we  have  found  a  saddlepoint  coding  and  jamming  strat¬ 
egy,  with  mutual  information  rate  ~H(Geoo(\D)).  For  an 
ADD  jammer  in  the  continuous  time  fluid  model,  we  have 
shown  that  the  mutual  information  rate  for  a  saddlepoint  is 
between  0.55 /D  bits  per  unit  delay  and  4/D  bits  per  unit  de¬ 
lay,  if  a  saddlepoint  exists.  For  a  MBB  jammer  in  the  discrete 
time  packet  model,  we  have  upper  and  lower  bounds  on  the 
mutual  information  rate  for  a  saddlepoint  that  Eire  within  a 
factor  of  2.  The  min-max  and  max-min  capacities  of  the  fluid 
models  are  shown  to  dominate  those  of  the  packet  models  for 
several  scenarios. 

For  many  of  our  results  we  assume  that  the  coder  and  de¬ 
coder  have  access  to  a  source  of  common  randomness  (they 
choose  a  code  without  the  jammer’s  knowledge),  and  that  the 
coder  and  decoder  have  access  to  a  common  clock.  However, 
for  particular  constraints  and  models,  such  as  a  MDLD  con¬ 
straint  in  the  continuous  time  packet  model,  we  have  coding 
schemes  that  do  not  depend  on  these  assumptions. 

V.  More  Information 

For  more  information  and  a  complete  paper  see: 
http://www.comm.csl.uiuc.edu/~hajek. 
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Abstract  —  We  study  information  transmission 
through  a  finite  buffer  channel  modeled  as  a  concate¬ 
nation  of  a  discrete  memoryless  channel  and  a  finite 
state  erasure  channel.  The  state  of  the  erasure  chan¬ 
nel  is  determined  by  the  buffer  occupancy  upon  ar¬ 
rival  of  the  transmission  symbol;  an  erasure  occurs 
when  an  input  arrives  to  a  full  buffer.  We  show  that 
the  capacity  of  the  channel  depends  on  the  long-term 
loss  probability  of  the  buffer  and  the  capacity  of  the 
DMC.  Thus,  even  though  the  channel  itself  has  mem¬ 
ory,  the  capacity  apparently  depends  only  on  the  sta¬ 
tionary  loss  probability  of  the  buffer.  We  also  show 
that  delayed  feedback  does  not  help  in  this  channel. 
We  also  study  the  channel  as  a  deletion  channel  where 
we  do  not  know  where  the  erasures  have  occurred. 

I.  Summary 

We  propose  a  channel  abstraction  for  the  finite-buffer  channel 
and  study  its  capacity.  This  model  is  motivated  by  packet- 
switched  networks,  where  a  packet  is  queued  in  a  finite  buffer 
on  each  router  along  its  path  through  the  network.  A  packet 
can  be  dropped  because  of  buffer  overflow,  or  corrupted  due 
to  transmission  errors.  We  do  not  consider  coding  in  inter¬ 
arrival  times  in  this  abstraction1.  Note  that  the  sender  may 
have  control  over  the  long-term  packet  arrival  rate,  which  af¬ 
fects  the  loss  process  at  the  buffer;  however,  there  is  no  side 
information  transmitted  using  the  arrival  process. 

We  formulate  this  problem  as  transmission  over  a  finite 
state  channel  where  the  transitions  of  the  finite  state  channel 
occur  due  to  arrivals  and  departures  of  packets  to  the  buffer. 
The  model  considered  resembles  the  problem  of  transmission 
through  finite  state  channels  studied  extensively  [2].  But  one 
of  the  differences  is  that  the  state  process  need  not  be  Marko¬ 
vian  (see  Figure  I).  In  this  paper  we  consider  only  a  single 
user’s  packets  arriving  at  the  buffer  and  the  buffer  state  is 
affected  by  the  arrivals  of  that  user. 


Figure  1:  Finite-state  channel  model. 

We  first  consider  the  problem  where  the  receiver  knows 
when  a  packet  is  dropped.  In  practice,  this  is  done  using  a 

1This  is  conjectured  due  to  the  result  in  [1]  that  coding  in  interar¬ 
rival  times  is  unnecessary  when  the  alphabet  size  of  the  transmitted 
symbol  is  large  (packet  sizes  in  current  networks  range  from  a  few 
tens  of  bytes  to  a  few  thousand).  Though  this  was  proved  in  the 
context  of  infinite  buffer  channels,  we  believe  that  this  is  true  in 
our  case  as  well. 


sequence  number  associated  with  packets.  Later  we  study  the 
channel  where  this  is  not  known  and  model  it  as  a  deletion 
channel.  Under  regularity  conditions  on  the  state  transition 
process  we  can  prove  a  coding  theorem  for  the  proposed  chan¬ 
nel  model  [3].  We  show  that  though  this  channel  has  memory, 
the  capacity  is  determined  by  the  long  term  stationary  loss 
probability  of  the  buffer.  That  is,  the  capacity  is  the  prod¬ 
uct  of  the  capacity  of  the  DMC  and  that  of  the  long  term 
probability  of  a  packet  getting  through.  This  shows  that  even 
though  the  finite  buffer  channel  has  complicated  memory,  its 
capacity  behavior  is  akin  to  a  simple  erasure  channel. 

Proposition  1.1  Under  mixing  and  asymptotic  mean  sta- 
tionarity  conditions  on  the  state  process  {Qi},  the  capacity 
of  the  finite  buffer  channel  is  given  by, 

C=  lim  Cn  =C0  lim  -Y>{Q,  ?  B} ,  (1) 

n—koo  n  — k  oo  Tl  7 — ' 

where  B  denotes  the  full  buffer  state  and  Co  is  the  capacity  of 
the  DMC.  Furthermore,  capacity  can  be  achieved  by  an  i.i.d. 
input  process  {X,}. 

This  capacity  is  expressed  in  bits  per  packet.  This  can 
be  translated  to  a  transmission  rate  (bits/second)  by  taking 
into  account  the  packet  arrival  process,  based  on  some  ergodic 
conditions  on  the  arrival  process.  Note  that  the  average  packet 
arrival  rate  can  be  chosen  to  maximize  this  transmission  rate. 

We  also  studied  the  case  where  there  is  feedback  available 
from  the  channel  output  to  the  transmitter,  delayed  by  at  least 
one  symbol.  We  showed  that  feedback  in  this  case  does  not 
improve  the  channel  capacity  even  though  the  channel  could 
have  complicated  memory2. 

Finally  we  study  a  model  of  transmission  in  the  absence 
of  sequence  numbers  on  the  packets.  This  can  be  studied  as 
a  deletion  channel.  Similar  problems  have  arisen  in  the  con¬ 
text  of  transmission  in  the  presence  of  synchronization  errors, 
studied  in  [4]  among  others.  This  is  a  difficult  problem  in  gen¬ 
eral  and  we  study  specific  deletion  models  and  develop  some 
bounds  for  achievable  performance. 
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Abstract  —  The  Burrows- Wheeler  transform  is  a 
block-sorting  algorithm  which  has  been  shown  empir¬ 
ically  to  be  useful  in  compressing  text  data.  In  this 
paper  we  study  the  output  distribution  of  the  trans¬ 
form  for  i.i.d.  sources,  tree  sources  and  stationary 
ergodic  sources.  We  can  also  give  analytic  bounds 
on  the  performance  of  some  universal  compression 
schemes  which  use  the  Bur  rows- Wheeler  transform. 

I.  Introduction 

Burrows  and  Wheeler  [2]  proposed  a  lossless  transformation 
which  they  showed  (with  empirical  evidence)  to  be  useful  for 
the  lossless  compression  of  data.  Recently  there  has  been 
increasing  interest  in  understanding  and  improving  the  per¬ 
formance  of  data  compression  algorithms  using  the  Burrows- 
Wheeler  transform  (BWT).  From  empirical  evidence  [2]  it  ap¬ 
pears  that  compression  methods  using  this  transform  achieve 
better  performance  than  Lempel-Ziv  techniques,  while  not  be¬ 
ing  computationally  as  intensive  as  compression  methods  us¬ 
ing  statistical  modeling  techniques.  While  there  has  been  a 
large  amount  of  empirical  evidence  to  show  the  efficacy  of  the 
transform  (e.g.,  [2],  [3]),  the  analysis  of  the  compression  ef¬ 
ficiency  of  methods  based  on  the  transform  has  received  less 
attention.  Sadakane  [5],  Arimura  and  Yamamoto  [6],  Balken- 
hol  and  Kurtz  [4]  and  Effros  [1]  have  provided  the  first  steps 
in  this  direction. 

In  this  paper  we  investigate  the  joint  distribution  at  the 
output  of  the  Burrows- Wheeler  transform.  For  various  classes 
of  input  sources,  we  show  that  the  output  distribution  of  the 
transform  is  approximately  memoryless  and  piecewise  station¬ 
ary,  in  the  sense  that  the  normalized  divergence  between  the 
output  distribution  and  a  memoryless  and  piecewise  station¬ 
ary  distribution  is  small.  Thus  coding  schemes  that  are  good 
for  memoryless,  piecewise  stationary  sources  can  be  used  to 
give  good  coding  performance.  We  also  derive  bounds  on  the 
coding  rate  for  some  data  compression  algorithms  that  use 
the  BWT.  The  schemes  that  we  analyze  were  also  analyzed  in 
[1]  where  bounds  were  obtained  on  average  code  length.  The 
bounds  we  give  are  on  individual  sequences. 

II.  Main  Result 

We  now  introduce  some  notation  so  that  we  can  precisely  state 
our  main  result.  We  consider  a  Markov  process  X  which  is  a 
Markov  source  taking  values  in  A  and  the  set  of  states  S  is  a 
complete  and  prefix-free  subset  of  A*.  Let  |<S|  =  k  and  label 
the  states  si,s2,...,Sk  in  lexicographic  order.  We  assume 
that  the  Markov  source  is  irreducible  and  aperiodic.  Let  the 
steady  state  probability  of  a  state  jgiSbe  denoted  by  7r(s) 
and  P(a|s)  denote  the  probability  that  a  £  A  occurs  when 

1This  work  was  partially  supported  by  the  National  Science 
Foundation  under  Grants  NYI  Award  IRI-9457645  and  NCR 
9523805 


we  are  in  state  s  £  S.  Let  C(i)  =  x(sj).  We  will 

show  that  the  divergence  between  the  output  distribution  and 
a  memoryless,  piecewise  stationary  distribution  with  k  —  1 
transitions  is  small.  Let  Ti ,  T2, . . . ,  Tk+i  be  integers  defined 
by  Ti  =  LC7(i  —  l)nj  +  1.  Note  that  (7(0)  =  0  and  so  T)  =  1. 
Let  us  now  define  a  memoryless  distribution  Qn  with  k  —  1 
changes  in  distribution,  by 

k  Ti+i-i 

j=l  i=T, 

We  show  that  the  output  distribution  is  close  to  the  distribu¬ 
tion  Qn. 

Theorem  1  Consider  a  tree  source  for  which  i-’(a|s)  >  0  for 
all  a  £  A,  s  £  S  with  entropy  rate  H .  Let  Xn  be  the  output  of 
the  tree  source  in  steady  state,  Yn  =  ^BWT(72-(Xn))  and  Py » 
denote  the  distribution  ofYn.  Then 

iD(PHIQn)<^= 

n  \ /n 

for  some  constant  c,  where  7 Z  is  a  map  from  a  string  to  its 
reverse  and  i^bwt  is  a  map  from  a  string  to  the  string  part  of 
its  Burrows-  Wheeler  Transform. 

The  assumption  that  P(a|s)  >  0  for  all  a,  s  can  be  removed 
and  a  result  similar  to  the  one  above  can  be  given.  A  re¬ 
sult  similar  in  spirit  to  the  one  above  can  also  be  shown  for 
stationary  ergodic  sources. 

Finally,  we  mention  that  we  have  also  analyzed  various 
methods  to  compress  the  the  output  of  the  BWT  and  obtained 
bounds  on  their  performance.  These  results  are  like  those  in 
[1]  except  that  we  obtain  results  for  individual  sequences. 
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Abstract  —  We  apply  complexity  regularization  to 
statistical  ill-posed  inverse  problems  in  imaging.  We 
formulate  a  natural  distortion  measure  in  image  space 
and  develop  nonasymptotic  bounds  on  estimation  per¬ 
formance  in  terms  of  an  index  of  resolvability  that 
characterizes  the  compressibility  of  the  true  image. 
These  bounds  extend  previous  results  that  were  ob¬ 
tained  under  simpler  observational  models. 

I.  Statement  of  the  Problem 

In  imaging  problems  such  as  tomography,  astronomical 
imaging,  ultrasound  imaging,  radar  imaging,  forensic  science, 
and  image  restoration,  a  statistical  model  relating  the  obser¬ 
vations  to  the  underlying  image  is  often  available  [1],  Con¬ 
sider  a  penalized-likelihood  approach  to  statistical  imaging: 
f(y)  =  orgmin/[—  lnp(y\f)  +  pi >(/)],  where  p(y\f)  is  the 
conditional  density  relating  the  observations  y  &  y  to  the 
unknown  image  /  €  T ,  and  €>(/)  is  the  regularization  func¬ 
tional,  which  penalizes  “unlikely”  estimates  and  stabilizes  the 
ML  estimator.  The  regularization  parameter  p  controls  the 
trade-off  between  the  log-likelihood  term  and  the  regulariza¬ 
tion  penalty.  The  choice  of  <!>(/)  depends  on  available  a  priori 
knowledge.  Ll ,  Besov,  total- variation  and  robust  smoothness 
penalties  are  state  of  the  art  in  image  processing. 

In  this  paper,  we  investigate  the  choice  of  complexity  mea¬ 
sures  for  the  regularization  penalty  <!>(/).  Such  penalties  favor 
estimates  with  low  complexity  in  a  data  compression  sense. 
Compared  to  the  more  standard  L2,  L1  and  Besov  penal¬ 
ties,  complexity  regularization  penalizes  unlikely  estimates  in 
a  more  flexible  way,  as  complexity  measures  may  be  based 
on  rather  sophisticated,  possibly  implicit,  flexible  probabil¬ 
ity  models.  The  complexity-regularization  criterion  is  stated 
as  f(y)  =  argmin/gr  [—  lnp(j/|/)  +  pL(f)},  where  T  is  a  dis¬ 
crete  set  of  candidate  images,  informally  referred  to  as  a  code¬ 
book.  Complexity  is  measured  by  a  codelength  L(/)  associated 
with  each  /  €  F.  Codewords  should  satisfy  Kraft’s  inequal¬ 
ity  X3/ere~L^  —  L  The  MDL  principle  [2]  is  a  familiar 
instance  of  complexity  regularization,  where  p =  1. 

The  use  of  MDL  and  complexity  regularization  has  found 
theoretical  justification  in  a  variety  of  inference  problems 
[3,  4,  5].  Extending  such  analysis  to  problems  of  interest  in 
imaging  entails  several  technical  difficulties.  First,  the  data 
are  not  identically  distributed.  Second,  the  bounds  derived 
by  extension  of  the  techniques  in  [3,  4,  5]  are  often  too  large 
to  be  useful  in  practical  imaging  problems. 

Consider  the  relative-entropy  loss  d(/*,/)  = 

jjD(p(y\f*)\\p{y\f))  for  /*,/  6  T,  where  D{p\\q)  = 
fy  P(y)  ln  ffjj)  dy.  The  estimation  risk  is  defined  as 

1Work  supported  by  the  National  Science  Foundation  under 
award  MIP-9732995  (CAREER),  by  ARO  under  contract  num¬ 
bers  ARO  DAAH-04-95-1-0494  and  ARMY  WUHT-01 1398-SI .  and 
by  DARPA  under  Contract  F49620-98- 1-0498.  administered  by 
AFOSR. 


r(/*,/)  =  E[d(f*,f)],  where  the  expectation  is  with 

respect  to  p(y|/*).  Relative-entropy  loss  is  the  natural  choice 
to  characterize  the  performance  of  penalized  likelihood  esti¬ 
mators.  This  loss  becomes  a  squared-error  loss  for  additive 
white  Gaussian  noise  (AWGN)  models,  and  an  I-divergence 
loss  for  Poisson  noise  models.  If  d(/*,/)  for  some  /  /*, 

then  /*  is  not  identifiable.  For  ill-posed  problems,  the  class 
of  images  C£(/*)  =  {/  :  d(/*,/)  <  e}  is  large  for  any  e  >  0. 

II.  Upper  Bounds  on  Estimation  Performance 

We  now  give  upper  bounds  on  See  [6]  for 

more  details.  Define  the  index  of  resolvability  Rt,(f*)  — 
min/er  [d{f*,f)  +  /*  €  T ’.  This  quantity  describes 

how  well  /*  can  be  approximated  in  the  relative-entropy  sense 
by  a  moderately-complex  element  of  the  codebook  T. 

The  upper  bounds  are  essentially  proportional  to  the  index 
of  resolvability,  with  a  very  small  (0(l/N))  additive  constant. 
For  the  AWGN  model  yt  =  /*  +  w,,  1  <  i  <  N,  w,  ~  i.i.d. 
A/"(0,  a2),  we  have  Theorem  1  below,  which  applies  to  any 
p  >  1  (recall  p  =  1  is  the  MDL  choice).  The  techniques  used 
in  [4],  which  do  not  require  knowledge  of  the  noise  distribu¬ 
tion  but  assume  that  Bernstein’s  inequality  applies  to  that 
distribution,  provide  looser  bounds. 

Theorem  1  For  any  p  >  1  and  p  >  0,  the  loss  of  the 
complexity-regularized  estimator  f  under  the  AWGN  model 

satisfies  Pr  [ d{f*  J)  <  +  ^=Tyiv]  >  1  -  V-  The 

risk  is  upper-bounded  by  E[d(f*,f)}  <  %z\Rp(f*)  +  ■ 

For  some  non-Gaussian  models,  under  certain  large-sample 
assumptions,  log-likelihood  ratios  are  asymptotically  normally 
distributed,  and  tight  inequalities  can  be  obtained  again.  Un¬ 
der  some  additional  technical  assumptions,  the  first  bound  of 
Theorem  1  still  applies,  provided  that  the  inequality  is  re¬ 
placed  with  an  asymptotic  inequality. 
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Abstract  —  Pairs  of  binary  pilot  symbol  sequences 
are  jointly  designed  to  minimize  an  introduced  merit 
factor  whose  minimization  leads  to  the  reduction  in 
Cramer-Rao  lower  bound  (CRLB)  for  the  “two-sided” 
intersymbol  interference  channel  estimation. 

I.  Introduction 

It  is  a  common  approach  to  periodically  insert  known  sym¬ 
bols  in  order  to  reliably  estimate  the  channel  parameters  prior 
to  detection.  In  the  case  of  time- variant  multi-path  fading 
channels  where  the  path  delay  spread  is  on  the  order  of  sev¬ 
eral  symbols  or  larger,  pilot  symbol  blocks  that  span  the 
channel  memory  need  to  be  inserted.  In  deriving  optimal, 
or  some  decision-feedback  detection  and  channel  estimation 
algorithms,  the  signal  is  frequently  assumed  to  be  quasi-static 
in  an  interval  encompassing  a  number  of  transmitted  symbols. 

Here  it  is  assumed  that  both  pilot  symbol  blocks  (pream¬ 
ble  and  postamble)  that  frame  a  block  of  data  (see  Fig.  1)  are 
employed  for  estimation  of  the  (quasi-static)  channel  coeffi¬ 
cients  pertaining  to  a  particular  data  block.  This  approach 
we  term  “two-sided”  channel  estimation.  It  is  shown  the  con¬ 
structed  optimal  sequences  for  two-sided  channel  estimation 
require  that  the  two  pilot  symbol  blocks  framing  a  data  block 
almost  always  differ  and,  therefore,  the  optimal  signaling  re¬ 
quires  alternating  periodically  inserted  training  blocks. 


D  N  +  L-1 


Figure  1:  Two-sided  pilot  symbol  block  insertion. 


II.  Signal  Model 

A  symbol-spaced  received  signal  is  assumed  and  a  normal¬ 
ized  block  of  received  samples  over  which  the  channel  is  (quasi- 
static  can  be  expressed  as  follows: 

r  =  Ah  -I-  n. 

n  is  a  sample  vector  of  a  white  circular  Gaussian  noise  process 
with  a  two-sided  PSD  No/Es,  where  Es  is  the  symbol  energy; 
h  is  a  Lxl  vector  of  channel  coefficients.  A  is  a  Toeplitz 
matrix  corresponding  to  the  transmitted  sequence  of  symbols 
from  {+1,  -1}  of  the  form  A  =  [P^ DTPj]T.  Pi  and  P2  are 
N  by  L  pilot  symbol  Toeplitz  submatrices  consisting  of  only 
preamble  and  postamble  symbol  sequences  of  length  (N  +  L  — 
1)  and  no  data  symbols  DisaD  +  L— lbyL  submatrix  that 
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holds  all  the  data  symbols.  N  >  L  is  assumed  so  that  each 
pilot-symbol  block  spans  the  channel. 

III.  Mini-Max  Criterion  and  Optimal  Sequences 

The  CRLB  of  the  two-sided  ML  channel  estimation  based 
on  the  “two-sided”  pilot-symbol  matrix  P  =  [Pf  P;T]  is 


where  R  =  PHP  =  PfPi  +  P|fP2-  Instead  of  directly  min¬ 
imizing  tr{R-1}  we  suggest  minimizing  the  largest  absolute 
sum 

Pmax  =  max  \pij\, 
jAi 

where  pij  is  the  ij'-th  element  of  R.  Minimization  of  pm ax  is 
equivalent  to  the  minimization  of  the  maximum  Gerschgorin 
disc  radius  of  R.  Thus,  it  attempts  a  reduction  in  the  eigen¬ 
value  spread  and  forces  the  matrix  R  to  have  a  form  which  is 
as  close  as  possible  to  the  diagonal  form. 

When  pmax  =  0  the  Grammian  matrix  R  =  2 N  ■  I,  where 
I  is  the  identity  matrix.  The  ML  channel  estimation  achieves 
the  absolute  minimum  variance  lower  bound  Binary 

odd-  and  even-periodic  complementary  sequence  ([1],  [3])  pairs 
achieve  pmax  =  0  and,  thus,  are  optimal  for  “two-sided”  ISI 
channel  estimation  for  even  N  >  L. 

When  N  is  odd,  pmax  =  0  (and  the  CRLB  jA  ™) 
cannot  be  achieved.  For  a  subset  of  odd  N  “almost- 
complementary”  periodic  binary  sequence  pairs  achieve  the 
minimum  possible  pmax  =  2[~!j .  Additionally,  “good”  se¬ 
quence  pairs  achieve  pmax  =  4[L/2J  <  2N  which  assures 
that  R  is  non-singular  and,  consequently,  that  the  CRLB  is 
bounded.  Given  a  generator  sequence  u  =  [no, •  •  • , ujv-i], 
both  almost-complementary  and  good  sequences  pairs  (pi  =* 
bi,o,  ■  ■  ■  ,P1,W+L-2],P2  =  [P2,0,  •  ■  ■  ,P2,N+L-2\)  are  formed  as 
follows: 

Pl,k  =  Ufcmod N  ClTld  P2,k  ~  (  1)  P\,k^ 

for  0  <  k  <  N  +  L  —  2.  For  almost  complementary  sequences 
the  periodic  autocorrelation  of  the  periodic  extension  up  of  u 
is  |  ukuk+l\  =  1  for  0  <  J  <  IV  —  1.  That  is,  they  can  be 

formed  from  m-,  Barker,  Legendre,  and  twin-prime  sequences 
(see  e.g.  [2]).  “Good”  sequences  are  based  on  sequences  given 
in  [4]  whose  periodic  autocorrelation  has  values  in  {1,  —3}. 
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Abstract  —  We  obtain  a  general  form  of  the  multi¬ 
variate  Rayleigh  and  exponential  probability  density 
functions  (p.d.f.s)  when  these  are  generated  by  corre¬ 
lated  Gaussian  random  variables.  A  general  expres¬ 
sion  for  the  exponential  characteristic  function  (c.f.) 
is  also  derived. 

I.  Introduction 

Multivariate  Rayleigh  and  exponential  distributions  [1]  arise 
in  the  performance  analysis  of  digital  modulation  schemes  over 
correlated  Rayleigh  fading  channels  using  diversity  combining 
techniques.  The  Rayleigh  distribution  is  a  special  case  of  the 
Nakagami  distribution,  while  the  exponential  is  a  special  case 
of  the  gamma.  A  bivariate  Rayleigh  case  [2]  has  been  applied 
to  fading  channels  using  dual  diversity  [3],  A  multivariate 
gamma  case  has  been  dealt  with  in  situations  in  which  the  c.f. 
has  a  specific  form  [4]  [5].  Here  we  obtain  a  general  form  of  the 
multivariate  Rayleigh  and  exponential  p.d.f.s  when  these  are 
generated  by  correlated  Gaussian  random  variables.  We  also 
derive  a  general  expression  for  the  exponential  c.f. 

II.  Probability  Density  Functions 

Consider  zero-mean  real  Gaussian  Lx  1  random  vectors 
Xc  4  [  XCl,  •  •  ■  ,XCL  }T  ,  and  X,  =  [  Xn,...,Xa,  ]T  , 
with  covariance  matrices  K_cc  and  K_BS  and  cross-covariance 
matrix  K C3,  such  that 

E  [X2  ]  =  (K^c)..  =  {Kt,)ii  =  E  [X2J  , 
E[XciXJ=(JKc,)..=0,  i  =  1, . . .  ,L. 

In  other  words,  XCi  and  XSi  are  i.i.d.  Gaussian  for  each  i. 
Define  random  variables  an  $i,...,$i,  as 

oti  =  (Xl  +  X2,)*  ,  =  taiV1  ,  i  =  1  ,...,L.  Let 

a  —  [  ai ,  •  •  • ,  (Xl  ] T  be  a  Rayleigh  random  vector.  Denote 


Kcc 

Kcs  ' 

- 

A 

B  ‘ 

i  £L  — 

BT 

D 

such  that  A  —  [Aij]ij_l  ,  B_  —  >  B  =  [Bij lt,j=i  • 

From  the  joint  p.d.f.  of  (XC,X„),  the  p.d.f.  of  a,  which  is 
multivariate  Rayleigh,  is  given  by 

II- 

/■(g)  =  1-1 

(det(if))i  (2,r) 

u  >  0,  where 


J  J  exp 


d<p\  ■  ■  ■  dcpL  , 


9(m.0) 


'y  '  {a^  cos2  4>i  +  Da  sin2  <pi  +  2 Bn  cos  <j>i  sin  tp u2 


+ 


E 

i,j  —  l 


(Aij  cos  (pi  cos  <pj  +  Dij  sin  <pi  sin  <pj 
+  2 Bij  cos  <pi  sin  <pj  )uiUj 


n  =  [“l. •  ■  • . «t]T.  <t>  =  [0i. ,4>l}t ■  (i) 


The  p.d.f.  of  the  exponential  random  vector  7  given  by 

7—  [  7i,  •••,7  L  ]  =  [  a2,  ■••,07  ]T  can  be  obtained  from 

the  multivariate  Rayleigh  p.d.f.  (1). 

When  Xc  +  jX,  is  a  circular  complex  Gaussian  random 
vector  satisfying  E  [(Xc  +  jXs)(X<,  +  jX,)T]  =  0,  we  have, 
for  i,j  -  1,...,L, 

Aij  —  Aji  =  Dji  —  ,  Bij  =  —Bji  when  i  /  j ,  Bu  —  0 

in  (1),  and  therefore 

L  L 

S(«>  0)  =  ^  AiiU2  +  2  ^  (A2.  +  B2.)  ^  UiUj  cos  (<pi  -  <t>j  +  dij)  , 

*=1  i,j= 1 

*<j 

where  Qij  =  tan-1  [Bij/ Aij).  Further,  if  Xc  and  X,  are  i-i-d. 
zero-mean  Gaussian  random  vectors,  then  B  =  0. 

III.  Characteristic  Function 

Although  it  is  difficult  to  obtain  a  closed-form  expression  for 
the  multivariate  Rayleigh  c.f.,  the  multivariate  exponential  c.f. 
can  be  expressed  in  closed  form  as 

=  {det  (/L  -  2jdiag(w)Xcc)  }  5 

x  {det.  ( [/L  -  2jdiag(w)Xsg]  +  4diag (uS)KjB 

x  [Ll  -  2jdiag (tv)Xcc]  1  diag (w)Xcs^  |  ,  (2) 

where  IL  is  the  L  X  L  identity  matrix. 

If  we  have  the  condition  that  Xc  and  X,  are  i.i.d.  random 
vectors,  then  the  c.f.  (2)  simplifies  to 

Vjiju)  =  {det(/L  -  2jdiag(w)Xcc)}_1  .  (3) 

The  gamma  c.f.s  in  [4][5]  reduce  to  (3)  when  the  gamma  pa¬ 
rameter  equals  unity. 

IV.  Bivariate  Case 

By  putting  L  =  2  and  the  circularity  condition  in  (1),  we  ob¬ 
tain  the  bivariate  Rayleigh  p.d.f.  of  [2).  The  structure  of  this 
p.d.f.  does  not  simplify  further  in  the  case  of  i.i.d.  generating 
vectors  Xc  and  Xs. 
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Abstract  —  Recently,  Ericson  and  Zinoviev  pre¬ 
sented  a  clever,  new  construction  for  spherical  codes 
for  the  Gaussian  channel  using  ideas  of  code  concate¬ 
nation  and  set  partitioning.  This  family  of  new  spher¬ 
ical  codes  is  generated  from  sets  of  binary  codes  using 
equally  spaced  symmetric  pointsets  on  the  real  line. 
The  family  contains  some  of  the  best  known  spherical 
codes  in  terms  of  minimum  distance.  In  this  paper, 
we  present  a  new  decoding  algorithm  for  this  family  of 
spherical  codes  which  is  more  efficient  than  maximum 
likelihood  decoding.  At  low  signal  to  noise  ratios,  it  is 
99%  equivalent  to  maximum  likelihood  but  takes  just 
2%  of  the  computational  time. 

I.  Ericson  and  Zinoviev’s  Code  Construction 

In  [1],  a  clever  construction  of  spherical  codes,  some  with  op¬ 
timal  minimum  distance,  for  Gaussian  channel  is  presented. 
We  include  those  same  results  in  a  modified  form  for  even 
alphabet  size. 

The  code  construction  begins  with  choosing  K  even  and 
the  code  alphabet  Lk  =  {±|,  ±§,  ■  ■  • ,  Let  Lk  — 

{0,1,...,  y  —  1}  and  form  a  tree  with  node  labels,  T  = 
Lk  U  {A,  *},  using  the  following  rules. 

1.  The  root  of  the  tree  is  A  and  A  is  adjacent  only  to  *. 
Every  internal  node  has  exactly  two  children  except  for 
A  .  We  will  say  that  node  A  is  at  level  —1,  *  is  at  level 
0,  the  children  of  *  are  at  level  1,  etc. 

2.  The  children  of  *  are  labeled  0  and  1  with  0  being  the 
left  child. 

3.  For  succeeding  levels,  say  level  k,  the  left'  child  of  a 
node  at  level  k-  1  is  labeled  the  same  as  its  parent  and 
the  right  child  is  chosen  from  Lk  so  that  the  sum  of  the 
labels  of  the  two  children  is  2fc  —  1.  If  that  is  impossible, 
the  node  at  level  k  —  1  is  a  leaf. 

We  choose  a  binary  code  for  each  internal  node  of  the  tree. 
Codes  at  level  k  will  be  designated  C*  where  7  is  the  label  of 
the  corresponding  node  on  the  tree.  An  arbitrary  code,  Cx  1 
of  length  n  is  chosen  for  node  A.  A  code,  C»  of  length  n  and 
constant  weight  iu»  is  chosen  for  node  *.  Suppose  internal 
node  7  at  level  k  —  1,  (k  >  1)  has  internal  node  left  child  7 1 
and  internal  node  right  child  7 r  and  code  of  length  1 
and  constant  weight  1  has  been  chosen  for  node  7.  Then 
code  C*,  of  length  n*,  =  nip1  -  u/p1  and  constant  weight 
w^i  is  chosen  for  node  7 1  and  code  C,fir  of  length  n^T  =  w*T  1 
and  constant  weight  is  chosen  for  node  7 r. 

The  tree  of  binary  codes  and  alphabet  Lk  is  used  to 
form  a  spherical  code,  X,  of  length  n  for  the  Gaussian 
channel.  For  each  collection  of  codewords  {cly  €  C\  \ 


Cj  is  a  code  in  the  tree},  we  form  a  codeword  x  6  X  in  the 
following  manner.  Suppose  the  tree  has  m  +  1  levels  of  inter¬ 
nal  vertices.  We  form  a  m  +  1  by  n  matrix  where  the  rows 
are  labeled  by  the  levels  of  the  tree  and  the  i—  row  consists 
of  the  codewords  chosen  from  the  codes  at  that  level  in  the 
tree.  We  arrange  the  codewords  in  row  i  in  a  special  manner 
depending  on  the  binary  codes  chosen  in  the  i—  level  of  the 
tree.  The  binary  sequences  that  are  the  columns  of  the  matrix 
correspond  to  the  components  of  x  and  there  is  an  algorithm 
to  translate  each  binary  sequence  into  an  element  of  Lk- 
The  following  result  relating  the  minimum  distance  of  the 
spherical  code  X  to  the  minimum  distances  of  the  binary  codes 
{C(j\k  >  -1}  appears  in  [1]. 

Theorem  1  Let  X  be  the  spherical  code  generated  by  Ericson 
and  Zinoviev’s  construction  using  the  binary  codes  {C*\k  > 
—1}.  Let  d *  be  the  minimum  Hamming  distance  of  the  code  C7 
and  let  d 2  be  the  (unnormalized)  minimum  squared  distance  of 
X.  Then  d2  >  min{d *  •  4fc+1| k  >  -1}. 

II.  Decoding  Algorithm 
The  first  step  is  to  perform  binary  partitions  of  the  alphabet 
Lk  which  we  now  simply  denote  L.  Our  partitions  are  made 
in  a  tree  structure  and  have  the  same  properties  of  partitions 
of  the  set  Z  +  \  in  [1].  We  call  the  elements  of  the  partition 
subalphabets. 

Let  x  =  (xi,  X2,  ■  ■ .  ,x„)  €  X,  where  xi,x2,  ■  ■  ■  ,xn  €  L, 
be  the  word  obtained  by  Ericson  and  Zinoviev’s  construction 
from  the  code  words  c1,  c2, . . . ,  c3  of  C1,  C2, . . . ,  C3,  respec¬ 
tively.  Suppose  di  =  minimum  Hamming  distance  of  Cl  and 
pi  =  squared  minimum  distance  of  the  subalphabets  at  level 
i.  Let  y  =  (yi,  y2,  •  ■  ■ ,  yn),yj  €  R  be  the  received  word  cor¬ 
rupted  by  noise.  The  new  decoding  algorithm  consists  of  s 
steps,  where  each  step  finds  c‘,  i  =  1, . . . ,  s.  At  each  step, 
the  decoding  algorithm  is  divided  into  an  inner  code  decoding 
algorithm  and  an  outer  code  decoding  algorithm.  The  outer 
code  decoding  algorithm  incorporates  Forney’s  idea  of  error 
and  erasure  decoding  and  Zinoviev’s  idea  of  distance  decod¬ 
ing. 

Theorem  2  Let  x  be  the  transmitted  codeword  constructed  by 
the  binary  codewords  c1 ,  c2 , . . . ,  c1 , . . . ,  cs  and  y  the  received 
word  corrupted  by  noise.  Assume  that  the  first  code  vectors 
c1,  c2, . . . ,  c’-1  have  been  found  correctly,  if  p(x,y)  <  diPi/4 
then  the  decoding  algorithm  will  correctly  decode  to  codewbrd 
c\ 
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Abstract  —  Quotients  of  IR2  by  translation  groups 
are  metric  spaces  known  as  flat  tori.  We  start  from 
codes  which  are  vertices  of  closed  graphs  on  a  flat 
torus  and,  through  an  identification  of  these  with  a  2- 
dimensional  surface  in  a  3-dimensional  sphere  in  IR4, 
we  show  such  graph  signal  sets  generate  [M,  4]  Slepian- 
type  cyclic  codes  for  M  =  a2  +  b2\  a,b€  2,  gcd(a,  b)  =  1. 
The  cyclic  labeling  of  these  codes  corresponds  to  walk¬ 
ing  step-by-step  on  a  (a,b)-type  knot  on  a  flat  torus 
and  its  performance  is  better  when  compared  with  ei¬ 
ther  the  standard  M-PSK  or  any  cartesian  product  of 
Mi-PSK  and  M2-PSK,  MiM2  =  M. 

Group  codes  introduced  by  D.  Slepian  and  developed  in 
subsequent  articles  are  defined  as  finite  sets  on  a  n-dimensional 
sphere  generated  by  the  action  of  a  group  of  orthogonal  ma¬ 
trices.  Geometrically  uniform  codes  introduced  by  Forney 
[3]  generalize  this  concept  by  considering  also  infinite  sets  of 
points  in  Euclidean  space  having  a  transitive  symmetry  group. 
We  consider  here  like  in  [2]  those  codes  extended  to  the  wider 
context  of  metric  spaces:  a  signal  set  S  C  X  is  a  geomet¬ 
rically  uniform  code  if  and  only  if  for  s,(  in  S  there  is  an 
isometry  /  (depending  on  s,t )  in  X  such  that  f(s)  =  t  and 
f(S)  =  S.  We  still  have  all  the  highly  desirable  properties 
that  come  from  homogenity:  same  distance  profile,  congru¬ 
ent  Voronoi  regions  and  same  error  transmition  probability 
for  each  codeword.  The  metric  space  considered  here  is  the 
flat  torus,  obtained  by  identifying  the  opposite  sides  of  a  par¬ 
allelogram  based  on  plane  vectors  u  and  v.  If  G  is  the  group 
generated  by  translations  by  u  and  v,  the  correspondent  flat 
torus  can  be  defined  as  the  quotient  T(a,6)  =  IR2/ <7,  what 
means  that  the  equivalence  relation  in  the  plane  is  given  by 

P'  «  P  <=>  P  —  P'  —  mu  +  nil  :  m,  n  €  2. 

A  flat  torus  can  be  visualized  as  a  standard  torus  in  3- 
space,  but  it  can  distinguished  from  the  later  by  being  per¬ 
fectly  homogeneous  and  locally  like  a  piece  of  plane  (fiat).  It 
can  only  be  realized  isometrically  as  a  2-dimensional  surface 
in  IR4  which  is  contained  in  a  3-dimensional  sphere. 

Starting  from  the  squared  lattice  22  on  the  plane,  we  induce 
a  closed  graph  F(a  b)  of  M  =  a2  +  b2  vertices  on  the  flat  torus 
generated  by  the  rotated  square  based  on  vectors  u  =  (a,  b) 
and  v  —  (— b,a ),  a,b  €  2.  An  isometry  which  embeds  this  flat 
torus  in  3-dimensional  sphere  in  IR4  can  be  induced  by: 


Vertical  translation  by  one  in  the  plane  corresponds  to  an 
orthogonal  4x4  matrix  g  which  is  product  of  rotations  of 
angles  and  -2~/l:  ■  If  a  and  6  are  coprimes,  this  matrix 

generates  a  cyclic  group  of  order  a2  +b2,  what  means  all  plane 
vertices  can  be  reached  starting  from  any  point  and  going 
north.  This  labeling  can  be  identified  with  a  walking  step- 
by-step  along  a  (a,  6)-type  knot  on  T.  The  included  figure 
illustrates  the  homogeneous  13  vertex  closed  graph  on  the  flat 
torus  (right  side)  labeled  by  2i3  through  vertical  translation 
walking  on  a  (2,  3)-type  (trefoil)  knot  (left  side). 


Formally,  considering  the  above  notation  for  T(a,b),  F(a,b) 
and  g ,  we  can  state: 

Proposition  1  The  vertices  of  the  unit  squared  graph  T(Q  b) 
on  the  flat  torus  T(a,b)  correspond  through  the  isometry  induced 
by  ip  to  a  Slepian-type  code  S  of  order  M  =  a2  +  b2  on  the 

3-sphere  of  radius  in  IR4.  Besides: 

(i)  If  gcd (a,  6)  =  1,5  is  generated  by  a  single  point 

¥>((0,0))  =  (1,0, 1,0)  through  the  action  of  the  cyclic 

group  ( g )  =  2a2+b2  (g  is  the  direct  product  of  rotations  whose 
angles  are  and  -gffc). 

(ii)  If  gcd(a,6)  =  m  >  1,  S  is  generated  by  a  minimal  set 
¥>((*,  0)),  0  <  k  <  m  —  1  through  the  cyclic  group  (gi)  = 
^(a2+b2)/m-  that  is,  this  subgroup  of  orthogonal  4x4  matrices 
that  acts  transitively  on  S  is  isomorphic  to  22a2+b2 ym- 

(Hi)  The  minimal  Euclidean  distance,  d,  in  IR4  between  two 
Slepian- signals,  considering  the  3-sphere  re-scaled  to  radius 
one  is  given  by: 

'-’(-"■ter?)— •(???))• 

In  [1]  a  graph  metric  approach  for  geometrically  uniform 
codes  of  any  order  M  on  flat  tori  is  summarized. 


tp(x,y)  = 


y/a2  +  b2 
2 n 


2n(ax  +  yb )  2-k (ax  +  yb) 

'  a2  +  b2  ’  a2  +  b2  ’ 


rns  27r(qy  -  xb)  2tt (ay  -  xb) 

-  a2+b2  ,Sm  a2  +  b2 
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Abstract  —  A  majority  logic  decoding  is  suitable  for 
ASIC  design  of  the  proposed  code.  Four-dimensional 
size-five  code  of  625-bit  length  was  implemented  on  a 
VLSI  and  attained  an  operation  speed  up  to  50Mbps 
and  32-bit  burst  correction. 

I.  Introduction 

Recent  code  requirements  are  to  attain  high-speed  opera¬ 
tion  and  robust  correction  ability  for  a  long  burst  error.  The 
proposed  code  has  been  constituted  on  a  geometric  structure 
of  high-dimensional  cube  or  torus.  The  code  properties  are  de¬ 
pendent  not  on  the  Hamming  distance  but  on  the  geometric 
size  and  symmetry  of  the  code.  The  characteristics  and  uncor- 
rectable  symmetrical  solid  are  discussed.  This  paper  describes 
a  majority  logic  decoding  which  is  suitable  for  high-speed  op¬ 
eration  on  an  ASIC. 

II.  Code  Principle 

A  code  block  is  wound  up  to  a  small  symmetrical  cube  with 
size  m  on  a  high  n-D  code  space.  Each  digit  of  the  cube  satis¬ 
fies  n  parity  check  relations  of  each  axial  check  line  containing 
m  digits.  The  transmission  rate  becomes  n  power  of  one  mi¬ 
nus  7n-inverse.  Both  edges  of  each  parity  line  are  identified 
as  a  closed  circle  by  way  of  the  parity  function.  So,  the  cube 
topologically  becomes  an  77-dimensional  discrete  torus.  If  the 
size  of  the  cube  is  smaller  than  the  geometrical  mesh  modeled 
by  the  inverse  of  the  mean  error  rate  of  the  channel,  the  cube 
could  be  transferred  through  the  channel  without  serious  er¬ 
rors.  The  transmission  order  of  the  code  digits  varies  in  many 
ways  with  the  winding  of  the  torus  knot.  For  a  high-D  long- 
block  code,  errors  introduced  by  a  channel  become  random  on 
each  parity  line,  since  the  errors  are  dispersed  by  the  winding, 
regardless  of  random  or  burst  errors.  The  correction  ability 
for  both  errors  is  roughly  given  as  follows;  correctable  burst 
length  versus  block  size  or  the  mean  error  rate  for  random  are 
equally  given  by  a  function  of  the  inverse  of  the  code  size  m. 

III.  Decoding  Characteristics 

The  code  works  efficiently  on  a  majority  logic  decoding 
scheme  of  the  number  of  erroneous  parity  lines  of  the  said 
digit.  When  a  digit  exceeds  the  threshold  is  correctly  cor¬ 
rected,  the  other  erroneous  digits  on  the  connected  parity  line 
come  up  and  are  corrected  at  the  next  decoding,  since  the  er¬ 
roneous  weights  becomes  high  by  one.  With  this  code  alone, 
it  is  possible  to  iterate  hard  decision  decoding  any  number  of 
times  because  the  parity  line  does  not  lose  the  function  due  to 
the  preceding  correction.  Through  iteration,  error  successively 
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decreases  to  the  probabilistic  limit  given  by  the  symmetrical 
error  solid.  The  error  remains  uncorrected  for  the  high-D  er¬ 
ror  solid,  for  example,  the  symmetric  n-D  solid  is  undetected 
on  account  of  the  parity  function,  so  the  n  —  ID  solid  can  be 
detected  but  not  corrected  because  the  error  position  is  not 
determined.  The  half-error  symmetrical  n-D  solid  is  also  un¬ 
corrected,  since  error  and  true  digits  are  interchanged  during 
each  decoding.  In  order  to  correct  the  error  solid  perfectly, 
the  dimensions  of  the  solid  should  be  two  degrees  less  than 
the  code  space  dimensions. 

IV.  VLSI  Implementation 

The  code  consists  of  a  simple  parity  check  calculation  and 
the  relationship  between  the  parity  and  the  data  digits  was 
clearly  obtained.  A  large  part  of  the  encoding  and  decoding 
processes  was  built  in  by  adopting  wired  connection  between 
the  memory  cells  of  the  VLSI.  The  majority  logic  decoding  of 
each  cycle  in  the  iteration  was  performed  with  just  one  clock 
time,  excepting  one  block  time  delay  to  receive  a  full  code 
block.  The  VLSI  architecture  resulted  in  increased  code  speed. 
The  encoder  and  decoder  circuits  of  the  four-dimensional  and 
five-size  4Dm5  code  whose  code  length  is  625  bits  and  the  rate 
is  0.41  were  installed  on  a  50-kilogate,  0.6  micron  rule  ASIC. 

V.  Performance 

The  code  attained  high-speed  operation  up  to  50Mbps  and 
robust  correction  ability  for  burst  error  with  7  iterations.  The 
code  corrected  burst  error  up  to  32  bits  in  length  with  zero 
error.  The  performance  is  much  better  than  that  of  conven¬ 
tional  codes,  that  is,  16  bits  for  Reed-Solomon  code  of  (15,  7) 
on  q  =  4,  and  4  bits  for  Viterbi  decoding  Convolution  code 
with  constrain  length  K  =  7.  The  Turbo  code  with  the  Log- 
MAP  decoding  of  624  bits  in  length  corrects  almost  4  bits 
burst,  but  fails  in  the  decoding  two  times  out  of  ten  thousand 
trials.  It  took  the  simulation  time  more  than  hundred  times 
of  that  of  the  proposed  code.  When  the  code  is  evaluated  for 
random  errors,  the  performance  for  a  low-grade  decoding  bit 
error  rate  of  ten  to  the  minus  3  to  5  is  approximately  the  same 
as  the  Convolution  code  of  rate  R  —  7/8,  K  —  7  with  Viterbi 
decoding.  But  for  higher  grade  performance  of  ten  to  the  mi¬ 
nus  8  or  less,  the  proposed  code  shows  more  coding  gain  than 
Viterbi  decoding  of  Convolution  code  of  R  =  1/2,  K  =  7. 
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Abstract  —  Let  M  be  a  polynomial  metric  space 
(PMS)  [2]  with  metric  d(x,y)  and  standard  substitu¬ 
tion  t  =  cr(d(x,y)).  Any  finite  nonempty  subset  C  of 
M  is  called  a  code.  A  code  for  which  a(d(x,y))  <  cr(d) 
(x,y  E  C)  and  d  is  the  minimum  distance  of  C  is  an 
(M,  |C|,  <r)-code.  We  will  give  some  properties  of  the  so 
called  test  functions  for  codes  and  we  will  improve  the 
Levenshtein  bound  with  polynomials  of  degree  h(cr)  +  2 
and  h{a)  +  3. 


when  h(cr)  >  j.  We  examine  the  sign  of  Ga(M,Qj).  Let 
us  consider  the  interval  Ih(<7)  =  *£’*)  and  denote 

M**+7- 1)  =  r-  We  have  Gt7(M,Qh(a)+i)  >  0. 

Lemma  1  If  GT(M,QT+ 2)  >  0  then  Ga(M,Qh(a)+2)  >  0 
for  a  E  Ih(<r)-  If  GT(M,Qr+k)  <  0  for  k  >  2  then  there  exist 

zo  <  *1+7-1  and  21  >  *1+7-1  such  that  Gff(M,Qh[a)+k)  < 
0  for  a  E  [*l+7-i>zi)  and  Ga{M,Qh{a)+k+\ )  <  0  for  a  E 
(*>,*1'£). 


I.  Introduction 

PMS  are  finite  metric  spaces  represented  by  P-  and  Q-  poly¬ 
nomial  association  schemes  as  well  as  infinite  metric  spaces, 
which  are  the  real  sphere,  the  real,  complex  or  quaternions 
projective  space  and  the  Cayley  projective  plane.  On  the 
other  hand  PMS  are  distinguished  as  antipodal  and  non- 
antipodal.  Any  PMS  is  connected  with  a  system  of  constants 
7*1 ,  a  system  of  orthogonal  polynomials  {Qi(*)}  and  adjacent 
system  of  polynomials  {Q£’b(*)}  with  roots  —1  <  tk'b  <  1, 
i  =  l,...,fc,  ordered  in  increasing  order,  tak'b  =  tk'bk.  Most 
of  the  properties  of  {Q£’b(*)}  can  be  found  in  [2].  By  defini¬ 
tion  Tk’b(x,y)  =  J2i=o  ri'bQVb(x)Qi'b(y)-  Many  bounds  for 
the  cardinality  of  codes  and  designs  were  obtained  by  using 
the  Linear  Programming  Theorem  [2,  p.544] .  If  we  denote  by 
Am, a  the  set  of  real  polynomials  which  satisfy  the  conditions 
of  the  LP  Theorem,  then  |Cj  <  Q(/),  for  /  E  Am, <7-  We  will 
investigate  the  Levenshtein  bound  L(M,a)  for  codes,  which 
can  be  presented  in  the  following  form  [2]: 


\C\  <  L(M,a)  =  ii{fa {t))  =  (l- 


Ql'°i+g(g) 


AC  — 

)  iz r" 


In  other  words  there  exists  an  interval  Ir  —  (zo,zi)  for 
<t,  containing  *1+7- 1  *n  which  Qr+jt)  is  negative,  i.e. 

the  Levenshtein  bound  can  be  improved  in  this  interval  using 
polynomial  of  degree  t  +  k,  k  >  2. 

Corollary  2  For  antipodal  PMS  the  test  function 
Ga{M,Qh(a)+ 2)  is  positive. 

As  a  consequence  of  the  above  using  our  results  from  [3]  we 
conclude  that  the  smallest  possible  degree  of  the  improving 
polynomials  is  r  +  2  =  h(cr)  +  2  or  h(a)  +  3  for  non-antipodal 
spaces  and  r  +  3  =  h(a)  +  3  or  h(a)  +  4  for  antipodal  PMS. 
Here  we  present  the  analytical  form  of  the  polynomial  which 
improve  the  Levenshtein  bound  in  the  non-antipodal  case. 

Theorem  3  Let  A4  be  non-antipodal  PMS,  t  =  h(*l+7-i) 
and  let  us  consider  the  interval  IT.  Then  the  polynomial 

r(t-,r  +  2)  =  (f-a)(t+l)‘  MT&fror))2 

+  {PxT&d,  a)  +  (*,  a)  +  T°k'\t,  a))\ 

of  degree  r  +  2  belongs  to  Am. <7  for  constants  satis¬ 

fying  certain  conditions. 

Now  using  the  LP  Theorem  with  the  polynomial  f<7(t\r+ 2) 
we  derive  new  analytical  bound  V(M,ct). 


where  e  =  0  if  *1’^  <  a  <  <1’°  and  e  =  1  if  tl’°  <  a  <  tl'1, 
and  /CT(*)  =  (*  —  cr)(<  +  l)e(Tfcif1(*|0'))2  of  degree  h(cr). 


Theorem  4  If  the  conditions  of  Theorem  3  are  satisfied  then 
\C\  <  V(M,a)  =  D(/CT(*;  T  +  2))  <  L(M,  tr). 


II.  Test  functions  and  new  bound 

Boyvalenkov,  Danev  and  Bumova  [1]  obtain  necessary  and 
sufficient  conditions  for  the  optimality  of  /CT(*)  over  Am, a,  in¬ 
troducing  the  test  functions  G^(M,Qj).  They  prove  that  the 
bound  (1)  can  be  improved  by  a  polynomial  in  Am, <7  of  degree 
j  if  and  only  if  G„(M ,  Qj)  <  0.  In  [3]  we  define  analogous  test 
functions  GT(A4,<3j)  for  designs. 

In  this  section  we  use  the  connections  between  codes  and 
designs  and  the  corresponding  test  functions.  Applying  analo¬ 
gous  approach  as  in  [3]  we  investigate  the  properties  of  the  test 
functions  for  codes  and  derive  an  analytical  form  of  the  poly¬ 
nomials;  which  improve  the  Levenshtein  bound.  For  fixed  j, 
GAM.Qj)  is  a  continuous  function  of  a  and  Ga(M,Qj)  =  0, 
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Abstract  —  A  polynomial-time  soft-decision  decod¬ 
ing  algorithm  for  Reed-Solomon  codes  is  developed. 
The  algorithm  is  algebraic  in  nature  and  builds  upon 
the  interpolation  procedure  proposed  by  Guruswami 
and  Sudan  for  hard-decision  decoding.  Algebraic  soft- 
decision  decoding  is  achieved  by  means  of  converting 
the  soft-decision  reliability  information  into  a  set  of 
interpolations  points  along  with  their  multiplicities. 
The  conversion  procedure  is  shown  to  be  optimal 
for  a  certain  probabilistic  model.  The  resulting  soft- 
decoding  algorithm  significantly  outperforms  both  the 
Guruswami-Sudan  decoding  and  the  generalized  min¬ 
imum  distance  (GMD)  decoding,  while  maintaining 
a  complexity  that  is  polynomial  in  the  length  of  the 
code.  Asymptotic  analysis  for  a  large  number  of  in¬ 
terpolation  points  is  presented,  culminating  in  a  com¬ 
plete  geometric  characterization  of  the  decoding  re¬ 
gions  of  the  proposed  algorithm.  The  algorithm  easily 
extends  to  polynomial-time  soft-decision  decoding  of 
BCH  codes  and  codes  from  algebraic  curves. 

I.  Introduction 

Reed-Solomon  (RS)  codes  are  one  of  the  most  extensively  used 
families  of  error-control  codes.  Since  the  discovery  of  these 
codes  four  decades  ago,  a  steady  stream  of  work  has  been 
devoted  to  their  decoding.  Nevertheless,  soft-decision  decod¬ 
ing  of  Reed-Solomon  codes  is  still  essentially  out  of  reach  of 
present-day  methods.  Indeed,  all  the  known  optimal  soft- 
decoding  algorithms  for  RS  codes  are  non-algebraic  and  run  in 
time  that  scales  exponentially  with  the  length  of  the  code.  On 
the  other  hand,  all  the  available  polynomial-time  algorithms, 
except  for  GMD  decoding  [1],  are  based  mainly  on  heuristics. 
Thus,  in  light  of  the  ubiquity  of  Reed-Solomon  codes,  efficient 
soft-decision  decoding  of  RS  codes  remains  one  of  the  most 
important  problems  in  coding  theory  and  practice. 

II.  Algebraic  Soft-Decision  Decoding 

In  the  full  version  of  this  paper  [3],  we  present  an  efficient  soft- 
decision  decoding  algorithm  for  Reed-Solomon  codes.  The  al¬ 
gorithm  is  algebraic  in  nature  and,  for  any  desired  level  of 
performance  (within  a  certain  fundamental  bound),  its  com¬ 
plexity  is  bounded  by  a  polynomial  in  the  codeword  length. 
Our  algorithm  significantly  outperforms  both  the  Guruswami- 
Sudan  [2]  decoding  and  the  GMD-based  [1]  decoding  methods. 
Figure  1  shows  the  performance  of  these  algorithms  for  a  sim¬ 
ple  coding  scheme:  a  (256,144,113)  RS  code  over  GF(256) 
concatenated  with  the  (9,  8, 2)  binary  code. 

Our  algorithm  is  based  on  the  algebraic  interpolation 
techniques  developed  by  Sudan  [2,4].  To  achieve  soft-decision 
decoding,  we  translate  the  soft-decision  reliability  information 
into  a  set  of  algebraic  constraints.  More  specifically,  given  the 
channel  output  vector  (l/i ,  2/2 ,  •  •  •  ,2/n)  and  the  a  posteriori 
transition  probabilities  Pr(cj|yi),  we  iteratively  compute  a  set 
of  interpolation  points  along  with  their  multiplicities.  We 


show  that,  at  each  step  of  the  computation,  this  choice  of 
interpolation  points  is  optimal,  in  a  certain  precise  sense. 
The  complexity  of  this  computation  is  0(n2  logn). 


Figure  1.  Performance  comparison  on  an  AWGN  channel 

The  algorithm  of  Guruswami-Sudan  [2,4]  is  based  on  alge¬ 
braic  interpolation  and  factorization  techniques  that  can  be 
implemented  efficiently  in  polynomial  time.  Our  soft-decision 
decoding  procedure  inherits  these  properties  of  Guruswami- 
Sudan  decoding.  One  of  its  most  intriguing  characteristics 
of  our  soft-decoding  algorithm  is  a  complexity/performance 
trade-off  that  can  be  chosen  freely.  Thus  the  coding  gain  pro¬ 
vided  by  the  Reed-Solomon  code  can  be  traded  for  complexity, 
in  real-time,  in  any  application.  Another  interesting  feature 
of  our  algorithm  is  that  it  readily  extends  to  the  decoding 
of  BCH  codes  and  most  algebraic-geometric  codes. 

We  also  present  asymptotic  performance  analysis,  as  the 
number  of  interpolation  points  approaches  infinity.  The 
analysis  leads  to  a  simple  geometric  characterization  of  the 
(asymptotic)  decoding  regions  of  the  algorithm.  We  find  that 
under  soft-decision  list-decoding,  arbitrarily  small  probability 
of  error  is  achievable  in  polynomial  time,  providing  the  rate  of 
the  code  does  not  exceed  a  certain  constant  if  that  depends 
on  the  channel.  Finally,  we  consider  modifications  to  our 
algorithm  designed  to  maximize  the  set  of  correctable  error 
patterns  on  the  following  channels:  g-ary  symmetric  channel, 
g-ary  symmetric  channel  with  erasures,  and  a  simplified 
g-PSK  channel.  Surprisingly,  our  results  for  the  g-ary  sym¬ 
metric  channel  are  stronger  than  those  reported  in  [2],  even 
though  this  channel  provides  no  soft-decision  information. 
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Abstract  —  The  paper  presents  a  Maximum  Like¬ 
lihood  Decoding  and  a  sub-optimum  decoding  algo¬ 
rithm  for  Reed- Solomon  codes.  The  proposed  algo¬ 
rithms  are  based  on  the  algebraic  structure  of  RS 
codes  represented  in  GF( 2).  Theoretical  bounds  on 
the  performance  are  derived  and  shown  to  be  accu¬ 
rate.  The  proposed  sub-optimum  algorithm  is  seen  to 
have  better  error  performance  compared  to  other  sub¬ 
optimum  decoding  algorithms  while  the  new  MLD  al¬ 
gorithm  has  significantly  lower  decoding  complexity 
when  compared  to  other  MLD  algorithms. 

I.  Introduction 

Reed-Solomon  (RS)  codes  are  a  powerful  class  of  maximum 
separable  block  codes,  suitable  for  error  control  on  real  chan¬ 
nels.  Algebraic  Hard  Decision  Decoding  (HDD)  algorithms 
are  widely  used  for  RS  codes.  It  has  been  shown  that  Soft  De¬ 
cision  Decoding  offers  2-3  dB  coding  gain  in  excess  of  HDD. 
Unfortunately  most  SDD  algorithms  proposed  in  the  past  have 
either  been  of  high  computational  complexity  or  fail  to  demon¬ 
strate  significant  performance  improvement  over  HDD.  Hence 
the  search  for  efficient  SDD  algorithms  for  RS  codes  still  con¬ 
tinues. 

Vardy  and  Beery  proposed  a  MLD  algorithm  [1]  based  on 
the  structure  of  the  generator  matrix  of  RS  codes  represented 
in  GF{ 2).  RS  codes  can  be  be  represented  as  a  union  of  cosets. 
Such  partitions  into  cosets  allow  a  decoding  algorithm  to  be 
developed.  The  algorithm  is  several  orders  of  magnitude  lower 
in  complexity  compared  to  trellis  decoding  for  high  rate  codes 
up  to  length  15  and  low  rate  (<  0.5)  codes  of  any  length. 

We  present  two  SDD  algorithms  based  on  the  same  struc¬ 
tural  properties  the  Vardy-Beery  algorithm  uses.  Hence  the 
algorithms  may  be  considered  as  modifications  of  the  Vardy- 
Beery  algorithm.  It  is  shown  that  a  RS  codeword  is  formed  by 
interleaving  words  chosen  (with  a  certain  order)  from  either  a 
binary  BCH  code  or  one  of  its  cosets.  This  property  is  used 
to  derive  a  computationally  efficient  ML  SDD  algorithm.  The 
reduction  in  complexity  achieved  with  reference  to  the  Vardy- 
Beery  algorithm  is  considerable.  The  proposed  algorithm  can 
be  changed  into  a  sub-optimum  algorithm,  thus  trading-off 
complexity  with  performance. 

II.  Decoding 

Let  gRs(V)  be  the  generator  polynomial  of  an  (N,K)  RS  code, 
Crs  >  over  GF(2m).  If  a  is  a  primitive  element  of  GF( 2m), 
gRs(A)  is  given  by 

gRS(W)  =  n(v  +  ai)  (1) 

i=l 

where  2t  =  N  —  K.  Now  define  an  (N,k)  binary  BCH 
code,  C bch  with  generator  polynomial  gBCn(V)  with  roots 


{a,  a2,  a3, . . .  ,a2t}  and  their  cyclotomic  conjugates  over 
GF( 2m).  The  message  length  k,  is  less  than  or  equal  to  K. 
Define  a  transformation  (j>  :  GF(2m)  — i  GF(2)m  with  basis 
{70,71,...  ,7m_i}.  Using  this  transformation,  a  code  poly¬ 
nomial,  crs(V)  of  Crs  is  given  by: 

m—  1 

CRS(X)  =  ^7,  [cbchW  +  1U)W] 

3=0' 

m  —  1  m  — 1 

=  +  (2) 

1= 0  j=0 

where  Cg)CH(A')  £  Cbch  and  1U)(X)  is  a  coset  leader  poly¬ 
nomial. 

We  use  the  above  algebraic  property  to  device  an  efficient 
decoding  algorithm. 

III.  Simulation  Results 

The  proposed  algorithms  were  applied  to  a  range  of  Reed- 
Solomon  codes  up  to  length  127  and  the  minimum  Hamming 
distance  up  to  7.  The  simulation  results  are  obtained  for  an¬ 
tipodal  signalling  over  an  AWGN  channel.  Table  1  gives  the 
required  bit  energy  to  noise  ratio  to  achieve  10-6  BER  for 
the  proposed  algorithms,  GMD  and  the  Berlekemp-Massey 
HDD  algorithm.  It  is  observed  that  the  proposed  MLD  algo¬ 
rithm  requires  1.9-3dB  lower  SNR  to  achieve  the  target  BER 
of  10-5,  compared  to  the  HDD  algorithm.  It  is  also  shown 
in  Table  1  that  the  proposed  sub-optimum  algorithm  achieves 
near-MLD  performance  for  all  codes  tested.  The  loss  in  per¬ 
formance  at  BER  of  10-s  is  consistently  below  1.0  dB. 


RS 

Code 

dmin 

&  at.  BER  =  10~5 

MLD 

SOPT 

GMD 

HDD 

(31,29) 

3 

6.4  dB 

6.6  dB 

7.9  dB 

8.4  dB 

(63,61) 

3 

6.6  dB 

6.8  dB 

8.1  dB 

8.6  dB 

(127,125) 

3 

6.8  dB 

7.0  dB 

8.4  dB 

8.8  dB 

(15,11) 

5 

5.3  dB 

5.6  dB 

7.2  dB 

7.8  dB 

(31,27) 

5 

5.2  dB 

5.3  dB 

7.2  dB 

7.6  dB 

(63,59) 

5 

5.5  dB 

6.3  dB 

7.6  dB 

7.8  dB 

(15,9) 

7 

4.5  dB 

5.1  dB 

6.9  dB 

7.6  dB 

(31,25) 

7 

4.2  dB 

5.2  dB 

6.7  dB 

7.2  dB 

Table  1:  Required  to  achieve  BER  of  10  5  for  various 
codes  and  decoding  algorithms. 
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Abstract  —  We  use  the  Plotkin  ( u ,  it-F ^-construction 
for  general  Reed-Muller  codes  (m,r)  and  relegate  de¬ 
coding  to  the  two  constituent  RM  codes.  First,  we 
use  the  better  protected  code  (m-l,r-l)  to  find  a  sub¬ 
block  v.  Then  we  proceed  with  the  block  u  from  the 
code  (m-l,r).  We  repeat  this  recursion  on  both  halves 
and  recalculate  the  reliabilities  of  the  received  sym¬ 
bols.  In  the  end,  we  perform  ML  decoding  on  the 
biorthogonal  codes. 

I.  Recursive  techniques 

Below,  general  Reed-Muller  codes  RM(r,m)  are  denoted 
{  ^  .  Plotkin  construction  represents  these  codes  in  the  form 

(it,  u  +  v),  where  u  €  {  m(T1 }  and  v  6  j  j  .  By  splitting 
both  halves,  we  obtain  shorter  RM  codes  until  we  arrive  at  the 
biorthogonal  codes  {^}  or  single-parity  check  codes  j  jli.  j- 

Now  consider  the  received  block  ( u , u+v)  corrupted  by 
noise.  We  first  try  to  find  the  better  protected  block  v.  In  hard 
decision  decoding,  we  use  its  corrupted  version  u  +  (u  4-  v) .  In 
more  general  setting,  we  first  use  the  left  half  u,  and  find  the 
posterior  probability  =  Pr{iii  =  0J__wi  }  of  each  symbol  it*. 

Similarly,  we  use  the  right  half  u  +  v  to  find  the  posterior 
probability  p"  of  any  symbol  m+Vi.  Then  any  symbol  Vi  has 
posterior  probability: 

P(Vi)  =  PiPi  +  (1  -  p'i){l  -  p"). 

In  Step  1  of  our  algorithm,  we  use  probabilities  p(vi)  to  exe¬ 
cute  soft-decision  decoding  of  vector  v  into  the  j  J-code. 
The  result  of  decoding  is  (presumably  correct)  codeword  v. 

After  v  is  found,  we  have  two  corrupted  copies  of  vector  it, 
namely  u  in  the  left  half,  and  (u  +  v)+v  in  the  right  half.  Our 
next  goal  is  to  jointly  decode  both  copies.  Similarly  to  Step 
1,  we  use  posterior  probabilities  p(u; )  of  symbols  tit.  Here  we 
combine  the  two  estimates  of  u i  obtained  on  both  corrupted 
copies.  Finally,  we  perform  soft  decision  decoding  and  find 
(presumably  correct)  subblock  it  £  {  m~1 }  . 

In  a  general  scheme,  decoding  on  the  length  n/2  is  again  rel¬ 
egated  to  the  shorter  codes.  On  all  intermediate  steps  we  only 
recalculate  symbol  reliabilities.  Maximum  likelihood  decoding 
is  executed  at  the  end  nodes  {  (  }  and  |j£i|  ■  Decoding  re¬ 
quires  about  0(n  log  n)  operations. 

It  can  be  shown  that  the  output  bit  error  rates  significantly 
vary  on  different  end  nodes.  In  particular,  the  highest  (worst) 
BER  is  obtained  on  the  node  { Tn-1r+1  }  that  is  decoded  first. 
An  important  conclusion  is  to  set  the  corresponding  infor¬ 
mation  bits  as  zeros.  In  this  way,  we  improve  on  the  overall 
performance  by  taking  the  subcodes  that  eliminate  a  few  least 
protected  information  bits  in  the  original  code  {  ™  }  . 
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In  asymptotic  setting  [4] ,  our  decoding  increasingly  outper¬ 
forms  both  the  majority  algorithm  and  the  former  recursive 
techniques  [l]-[3]  as  the  block  length  grows.  In  particular,  for 
long  RM  codes  of  fixed  rate,  we  increase  bounded-distance 
threshold  In  d  times  and  correct  most  error  patterns  of  weight 
up  to  (dlnd)/2.  Simulation  results  presented  below  show  that 
this  improvement  starts  at  very  short  lengths. 

II.  Simulation  results 

Table  1  summarizes  simulation  results  for  the  RM  code  {  ®  } 
of  length  512  and  dimension  256.  We  also  consider  a  subcode 
of  dimension  223  and  present  both  bit-  (BER)  and  block 
(BLER)  error  rates.  The  results  are  compared  with  the 
former  recursuve  technique  presented  in  [3].  Similar  results 
are  obtained  in  Table  2  for  RM  code  {  ® }  of  dimension  130 
and  its  subcode  of  dimension  87. 


Table  1 .  Output  error  rates  for  code  {  ®  }  . 


SNR  (dB) 

2 

3 

4 

Recursive  [3] 

0.9 

0.5 

0.2 

Recursive  (new) 

0.2 

0.03 

2-10_a 

BER  for  subcode 

0.05 

3-10_a 

3T0~B 

BLER  for  subcode 

0.2 

0.02 

2-10-4 

Table  2.  Output  error  rates  for  code  {  ®  }  . 


SNR  (dB) 

2 

3 

4 

Recursive 

0.2 

0.08 

8-10_a 

BER  for  subcode 

0.02 

10~a 

Tier3- 

BLER  for  subcode 

0.08 

3-Rp 

10~4 

Further  improvements  of  recursive  techniques  are  presented 
below  for  RM  code  {  ®  }  of  length  256  and  dimension  37.  For 
these  (or  similar)  parameters,  our  decoding  outperforms  all 
suboptimal  algorithms  known  to  date. 

Table  3.  Output  bit  error  rates  for  RM  code  {  *  }  • 


SNR  (dB) 

1 

1.5 

2 

2.5 

3 

BER 

Hr2 

4-10~a 

10_a 

2-10-4 

2-KT5 

BLER 

4-10"2 

10~2 

310_a 

5-10-4 

8T0~5 
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Abstract  —  In  this  paper,  the  statistical  approach 
proposed  by  Agrawal  and  Vardy  to  evaluate  the  error 
performance  of  the  Generalized  Minimum  Distance 
(GMD)  decoding  is  extended  to  other  reliability  based 
decoding  algorithms  for  binary  linear  block  codes, 
namely  Chase-type,  combined  GMD  and  Chase-type, 
and  ordered  statistic  decodings.  In  all  cases,  tighter 
and  simplier  bounds  than  previously  proposed  ones 
have  been  obtained  with  this  approach. 

I.  Summary 

A  difficult  task  related  to  suboptimum  decoding  algorithms 
is  their  error  performance  analysis  at  practical  SNR  values.  It 
has  long  been  believed  that  a  good  criterion  to  design  a  subop¬ 
timum  soft  decision  decoding  algorithm  was  to  prove  that  the 
algorithm  achieves  bounded  distance  decoding  (or  is  asymp¬ 
totically  optimum).  However,  recent  studies  indicate  that  this 
simple  criterion  usually  does  not  reflect  the  behavior  of  the  al¬ 
gorithm  considered  at  practical  SNR  values.  In  particular,  an 
approach  based  on  the  union  bound  is  highly  misleading  and 
more  sophisticated  bounding  methods  are  needed. 

In  [1],  a  new  upper  bound  on  the  error  performance  of 
GMD  decoding  [2]  has  been  presented.  Interestingly,  under 
some  mild  assumptions,  this  upper  bound  is  tight  at  all  SNR 
values.  The  error  performance  analysis  of  [1]  is  based  on  the 
probability  density  functions  of  the  y-th  ordered  reliability 
value  among  i  hard-decision  errors  in  a  received  sequence  of 
length  N  for  1  <  j  <  i,  and  on  the  probability  density  func¬ 
tions  of  the  Z-th  ordered  reliability  value  among  the  remaining 
N  —  i  correct  hard-decisions  in  the  received  sequence  of  length 
N,  for  1  <  l  <  N  -  i. 

In  this  paper,  we  first  extend  the  approach  of  [1]  to  evalu¬ 
ate  the  error  performance  of  Chase-type  decoding.  For  the 
algorithm-2  introduced  in  [3]  and  BPSK  transmission  over 
an  AWGN  channel,  the  obtained  bound  falls  on  top  of  the 
simulated  results  at  all  SNR  values,  as  depicted  in  Fig.  1  for 
Chase-2  decoding  applied  to  the  p  =  7  and  p  —  10  least  reliable 
positions  of  the  received  sequence  for  the  (127,64)  BCH  code. 
The  bounding  method  is  then  applied  to  the  combination  of 
GMD  and  Chase-type  decodings  as  introduced  in  [4].  Tight 
bounds  are  obtained  for  the  entire  family  of  algorithms  corre¬ 
sponding  to  this  generalization.  Finally,  the  bounding  method 
is  applied  to  the  ordered  statistic  decoding  (OSD)  algorithm 
of  [5].  The  computational  complexities  of  the  corresponding 
bounds  are  smaller  than  that  of  the  bounds  derived  in  [5]  for 
high  orders  of  reprocessing.  The  new  bounds  are  compared 
with  the  simulation  results  of  OSD  of  the  (128,64)  extended 
BCH  (eBCH)  code  in  Fig.  2.  The  detailed  derivations  of  these 
bounds  are  given  in  [6]. 

1This  work  was  supported  by  the  National  Science  Foundation 
under  Grant  CCR-97-32959. 
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Figure  1:  Word  error  rate  for  p-Chase  decoding  of  the 
(127,64)  BCH  code  with  p  —  7  and  p  =  10. 


Figure  2:  Word  error  rate  for  each  stage  of  order-4  OSD  of 
the  (128,64)  eBCH  code. 
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Abstract  —  The  trellis  complexity  of  causal  and  non- 
causal  interleavers  are  studied  via  the  introduction 
of  the  input-output  interleaver  code.  The  “average” 
complexity  of  a  uniform  interleaver  is  computed.  The 
trellis  complexity  of  a  turbo  code  is  then  tied  to  the 
complexity  of  the  constituent  interleaver.  A  proce¬ 
dure  of  complexity  reduction  by  coordinate  permuta¬ 
tion  is  also  presented,  together  with  some  examples 
of  its  application. 

I.  Introduction 

For  a  block  code  C(n,k),  the  most  used  trellis  complexity 
parameters  are:  the  maximum  state  complexity  S(C)  = 
max0<;<n  s(i),  where  s(i)  —  log2  |£(i)|,  and  S(i)  is  the  state 
space  at  time  0  <  i  <  n;  the  maximum  branch  complex¬ 
ity  B(C)  =  maxi<i<n  b(i),  where  b(i)  =  log2  |r(i,  i  +  1)|,  and 
r(i,  i+1)  is  the  trellis  section  at  time  0  <  i  <  n\  the  average 
branch  symbol  complexity  E(C)  =  (%™=q  |r(i,  i  +  l)|)/fc. 
It  is  well  known  that  coordinate  permutations  p  can  strongly 
change  the  complexity  parameters.  In  other  words,  given  C, 
one  can  base  a  “real”  measure  of  the  complexity  of  C  upon  the 
parameters  S  =  minp{5(p(C))},  B  —  minp{R(p(C))},  and 
E  =  min  p{E(p(C))}. 

II.  Interleavers 

An  interleaver  I  is  a  device  characterized  by  a  fixed 
permutation  p%  :  Z  <-»  Z.  X  maps  bi-infinite  input  bi¬ 
nary  sequences  x  into  permuted  output  sequences  y  with 
y(i)  =  x(pz(i)).  Given  an  interleaver  X,  we  introduce  the 
(input-output)  interleaver  code  Cz  defined  as  the  set  of 
all  input/output  interleaver  sequence  pairs  (x,y).  For  causal 
interleavers,  it  is  well  known  and  intuitive  that  the  state  space 
size  is  constant.  When  more  general  interleavers  (non-causal, 
too)  we  have  [1]: 

Theorem  1 

For  every  interleaver  code  Cx:  sj(i)  =  |-4t|  +  |'P,|,  where:  A,  = 

{j  €  Z  :  j  <  i,p(j)  >  t},  Pi  =  {j  €  Z  :  j  >  i,p(j)  <  i}.y 

III.  The  trellis  complexity  of  turbo  codes 

Let  us  consider  turbo  codes  of  rate-1/3  obtained  from 
two  equal  binary  systematic  convolutional  encoders  of  rate- 
1  /2  and  constraint  length  v  and  a  block  interleaver  (1, 7r)  of 
length  N. 

Theorem  2 

For  a  turbo  code  C  the  state  profile  is  equal  to:  sc(i)  = 
sx(i)  +  c(i)  ,  with  c(i)  <  2ia  y 

A  uniform  interleaver  of  length  A  is  a  probabilistic  in¬ 
terleaver  that  acts  as  the  “average”  of  all  possible  interleavers 
of  length  N. 

Theorem  3 

For  an  uniform  block  interleaver  of  length  N: 


RECTANGULAR  INTERLEAVER  N=64  (16X4) 


Fig.  1:  State  and  branch  profile  for  the  turbo  code  of  Example  1. 


szu(i)  =  2l’iVjy----  with  0  <  i  <  N.  Its  maximum  state 
complexity  is  equal  to  S^f\  =  N/2. 

Theorem  4 

For  a  turbo  code  C  formed  by  two  constituent  encoders  of 
constraint  length  v  and  a  uniform  block  interleaver  X  of  length 
N\  scuij)  —  Xl±Lp±±  +  c(i)  ,  with  c(i)  <  2v.  Its  maximum 
state  complexity  is  =  N/2  +  c  ,  with  c  <  2u.  v 

IV.  Reducing  the  complexity  of  turbo  codes 

Given  an  interleaver  (X,pz)  the  permutations  pi  =  (/,  p~l) 
and  p2  =  (p,  I)  minimizes  the  complexity  parameters  of  p{Cz) 
to  S  =  0,  B  =  1,  and  E  =  4.  Using  this  result,  to  reduce  the 
complexity  of  a  turbo  code  employing  a  block  interleaver  7r,  we 
have  considered  these  two  permutations:  pmin i  =  (/,/,7r-1) 

and  Pmin2  =  (7T,7r,  I). 

As  an  example,  impressive  results  in  terms  of  complexity 
reduction  through  the  application  of  pmin\  and  pmin2  can  be 
obtained  for  the  class  of  turbo  codes  employing  row-by-column 
block  interleavers.  It  can  be  proved  that,  when  N  is  a  power 
of  two,  N/2  <  S(3)  <  N/2  +  2v.  By  applying  pmini  (jpmini, 
respectively)  when  Nr  >  Nc  (Nr  <  Nc),  we  obtain  a  consis¬ 
tent  reduction  to  S ^  =  u(2 Nc  —  1)  ( S ^  =  v(2Nr  —  1)). 
Example  1 

Consider  a  turbo  code  composed  by  two  equal  4-state  con¬ 
volutional  encoders  and  a  block  rectangular  interleaver  with 
N  —  64,  Nr  =  16  and  Nc  =  4.  In  Fig.  1  we  report  the  state 
and  branch  profiles  of  the  turbo  code  evaluated  directly  and 
through  the  permutation  pmin  i,  showing  a  significant  com¬ 
plexity  reduction. 
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Abstract  —  We  show  that  the  decoding  performance 
of  a  simple  turbo  code  can  be  improved  by  cross¬ 
entropy  minimization  via  manipulation  of  the  initial 
a  priori  probabilities. 

I.  Improving  Turbo  Decoding 

While  Turbo  decoding  of  parallel  concatenated  codes  (PCC) 
has  been  shown  to  offer  near  Shannon-limit  performance,  it  is 
known  that  the  decoding  is  sub-optimal.  For  example  it  has 
been  shown  analytically  by  McEliece  et  al.  [1]  that,  for  certain 
received  values  of  a  (14,  3)  PCC,  the  turbo  decoding  process 
does  not  converge  .  However,  this  does  not  cover  all  cases 
of  non-convergence.  Furthermore,  there  are  also  cases  where 
the  turbo  decoding  process  converges  to  a  non-maximum  a 
posteriori  probability  (non-optimum)  decision. 

We  investigated  the  turbo  decoding  performance  when  the 
initial  a  priori  probabilities  (APRP)  are  biased  to  the  op¬ 
timally  decoded  message  for  this  (14,3)  turbo  code.  This 
method,  which  assumes  knowledge  of  the  optimum  decision, 
is  referred  to  in  this  paper  as  the  “Genie”  Turbo  Decoding 
method  (GT).  Figure  la  shows  the  BER  surface  when  initial 
APRPs  for  the  first  two  of  the  three  information  bits  are  bi¬ 
ased  with  respect  to  the  optimum  decision  with  values  ranging 
from  =  <52  =  0  (correctly  biased)  to  Si  =  =  1  (incorrectly 

biased).  The  BER,  which  is  measured  for  an  Eb/N0  of  5dB, 
shows  a  slight  improvement  when  both  bits  are  biased  cor¬ 
rectly  as  opposed  to  the  unbiased  case  (5,  =  0.5,  Vi). 

Hagenauer  et  al.  [2]  have  proposed  using  cross-entropy  be¬ 
tween  the  outputs  of  the  component  decoders  to  detect  con¬ 
vergence.  The  similarity  between  the  cross-entropy  surface 
(figure  lb)  and  the  BER  surface  (figure  la)  suggests  that  the 
cross-entropy  values  may  be  used  to  infer  initial  APRP  set¬ 
tings  in  order  to  improve  decoding  performance. 

We  modified  the  turbo-decoding  process  by  biasing  the 
APRPs  to  the  eight  possible  messages,  each  for  a  fixed  num¬ 
ber  of  iterations.  The  output  of  the  bias  that  yields  the  lowest 
cross-entropy  at  the  final  iteration  is  then  chosen.  We  refer 
to  this  technique  as  Entropy  Minimization  Turbo  Decoding 
(EMT).  Table  1  compares  the  percentage  increase  in  BER 
with  respect  to  optimum  decision  decoding  for  the  traditional 
turbo  decoding,  EMT,  and  GT  approaches  at  various  Eb/N0 
values.  The  performance  for  GT  and  the  traditional  turbo  de¬ 
coding  are  shown  for  the  average  obtained  between  50  and  100 
iterations,  while  the  EMT  performance  is  for  just  2  iterations 
(at  each  of  the  8  possible  messages) . 

II.  Results 

It  is  seen  that  GT  always  out-performs  the  traditional  turbo 
decoder  showing  that  there  is  a  potential  for  improvement  at 
all  Eb/N0  by  biasing  the  initial  APRPs;  further,  this  poten¬ 
tial  for  improvement  is  significantly  greater  at  higher  Eb/N0. 
Above  2  dB,  EMT  also  performs  better  than  traditional  turbo 
decoding  and  nears  the  performance  of  GT  at  5  dB. 


Based  on  these  results,  we  believe  it  is  possible  to  im¬ 
prove  the  performance  of  more  practical  turbo-decoders  by 
pre-setting  the  initial  APRPs. 

*3  =  05 


53  =  0.5 


Figure  1:  BER  and  Cross-Entropy  Surfaces 


2  dB 

3  dB 

4  dB 

5  dB 

Turbo 

6.07 

8.17 

10.81 

14.74 

EMT 

6.93 

7.29 

8.67 

8.71 

GT 

5.10 

6.15 

7.40 

8.46 

Table  1:  Percentage  Increase  in  BER  w.r.t.  Optimum 
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Abstract  —  We  present  a  simplified  method  for 
combining  turbo  decoding  and  binary  Markov  chan¬ 
nels.  The  resulting  performance  is  slightly  worse  than 
that  of  the  best  known  methods  using  supertrellis  ap¬ 
proaches,  but  it  clearly  outperforms  traditional  sys¬ 
tems  based  on  channel  interleaving.  Moreover,  the 
complexity  is  much  lower  than  in  the  supertrellis  case 
and  the  structure  of  the  encoder  does  not  depend  on 
the  parameters  of  the  hidden  Markov  model  describ¬ 
ing  the  channel. 

I.  Introduction 

Many  practical  digital  communications  channels  exhibit 
statistical  dependencies  among  errors.  The  error  pattern  of 
the  discrete  channel  (modulator-real  channel-demodulator) 
can  be  modeled  using  binary  Markov  channels  [1,  2].  It  is 
intuitive  that  the  presence  of  memory  in  these  channels  leads 
to  increased  capacity  relative  to  memoryless  channels  with 
the  same  stationary  bit  error  probability.  In  practice,  many 
communications  systems  make  use  of  a  channel  interleaver  to 
distribute  the  errors  so  that  codes  designed  for  a  memoryless 
channel  can  be  used.  While  the  application  of  interleaving 
does  not  change  the  capacity  of  the  channel,  the  achievable 
performance  of  a  decoder  which  assumes  that  the  channel  is 
memoryless  is  far  away  from  the  real  capacity  of  the  channel. 

Turbo  coding  for  binary  Markov  channels  has  been  previ¬ 
ously  described  in  [3].  However,  the  methods  proposed  in  [3] 
involve  a  considerable  increase  in  complexity,  since  supertrel¬ 
lises  jointly  describing  the  constituent  encoders  and  the  hid¬ 
den  Markov  models  have  to  be  built.  We  propose  a  simpli¬ 
fied  decoding  method,  which  performs  slightly  worse  than  the 
method  in  [3]  but  the  main  advantage  (besides  the  reduced 
complexity)  is  that  there  is  no  need  to  change  the  turbo  en¬ 
coder  structure  depending  on  the  channel  parameters. 

II.  Simplified  turbo  decoding  for  binary 
Markov  channels 

The  basic  idea  of  the  proposed  method  is  to  treat  the  trellis 
describing  the  binary  Markov  channel  as  another  constituent 
decoder  which  exchanges  extrinsic  information  with  the  other 
constituent  decoders  in  each  one  of  the  turbo  decoding  iter¬ 
ations.  The  channel  block  uses  as  extrinsic  information  the 
estimation  of  the  probability  of  the  error  pattern  that  is  pro¬ 
vided  by  the  constituent  decoder  blocks.  On  the  other  hand,  it 
produces  a  new  estimation  of  such  a  probability  which  will  be 
used  as  extrinsic  information  by  the  constituent  convolutional 
decoders.  This  results  in  three  different  classes  of  extrinsic  in¬ 
formation  that  are  interchanged  among  the  decoding  blocks. 
The  proposed  method  resembles  the  ones  proposed  in  [4,  5] 
for  continuous  hidden  Markov  channels  and  hidden  Markov 
sources,  although,  contrarily  to  [4],  in  this  case  it  is  necessary 
to  iterate  over  the  hidden  Markov  trellis. 
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III.  Simulation  Results 

In  order  to  assess  the  performance  of  the  proposed  method, 
we  consider  two  binary  Markov  channels  with  two  states.  For 
the  first  channel,  the  transition  probability  from  the  good  to 
the  bad  state  is  .0486,  and  .0914  is  the  value  of  the  transition 
probability  from  the  bad  to  the  good  state.  For  the  second 
channel  these  values  are  .006943  and  .013057,  respectively.  In 
both  cases,  the  bit  error  probability  in  the  bad  state  is  fixed  to 
.5.  The  performance  of  the  system  is  studied  as  a  function  of 
the  value  of  the  bit  error  probability  in  the  good  state  (notice 
that,  since  all  the  other  parameters  are  fixed,  there  is  a  one  to 
one  correspondence  between  the  bit  error  probability  in  the 
good  state  and  the  stationary  bit  error  probability,  p.) 

We  use  a  rate  1/3  turbo  code  that  includes  a  systematic 
bit  and  two  identical  recursive  8-state  convolutional  encoders 
with  generator  matrix  G(D)  =  and  2111  interleaver 

with  length  16384.  In  order  to  obtain  good  performance  it  is 
necessary  to  use  a  channel  interleaver  which  “separates”  the 
Markov  channel  and  the  turbo  decoder.  Each  simulation  con¬ 
sisted  of  at  least  40  million  bits.  For  rate  1/3  codes,  the  bit 
error  probability  corresponding  to  the  capacity  of  a  binary 
symmetric  channel  is  p  —  .174.  Therefore,  by  using  chan¬ 
nel  interleaving  and  ignoring  the  memory  of  the  channel  (the 
usual  approach  to  cope  with  bursty  channels,)  it  is  impossi¬ 
ble  to  send  reliable  information  through  any  of  these  channels 
when  the  stationary  bit  error  probability  is  higher  than  .174. 
However,  using  the  proposed  method,  convergence  for  the  first 
channel  is  achieved  at  p  =  .18  —  .185,  which  is  higher  than  the 
memoryless  limit  and  close  to  the  theoretical  limit  for  this 
channel  (which  corresponds  to  a  value  p  —  .2083.)  For  the 
second  channel  convergence  is  achieved  at  p  =  .19  —  .195.  The 
theoretical  limit  in  this  case  is  p—  .2307. 
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I.  Introduction 

The  weight  spectrum  of  a  turbo  code  [1]  is  useful  in  deriving 
its  performance  bounds.  Due  to  the  randomness  and  large 
size  of  the  interleaver,  it  is  extremely  difficult  to  obtain  the 
exact  weight  spectrum.  In  the  past,  the  average  weight  spec¬ 
trum,  averaged  over  all  interleavers  [2],  is  used  in  deriving  the 
bounds. 

By  introducing  several  limiting  factors,  we  are  able  to  de¬ 
rive  an  approximate  weight  spectrum  for  turbo  codes  with 
fixed  interleavers.  The  complexity  of  the  algorithm  grows  only 
linearly  with  the  size  of  the  interleaver. 

II.  Evaluating  the  Weight  Spectrum 

A  “global”  turbo  codeword  consists  of  three  binary  vectors: 
(u,  Ej ,  r,2),  where  u  represents  information  bits,  rq  and  r2  rep¬ 
resent  redundant  bits.  A  subcodeword  refers  to  either  (u,  rx) 
or  (it)  £2)’  One  limiting  factor  introduced  is  the  maximum 
weight,  Dmax,  of  codewords.  We  ignore  weights  greater  than 
Dmax  because  they  have  little  impact  on  the  bit  error  rate 
(BER).  We  only  consider  low  input- weight  codewords  since 
these  codewords  dominate  the  lower  end  of  the  weight  spec¬ 
trum  when  the  interleaver  guarantees  a  minimum  spreading. 
A  low  input-weight  codeword  may  consist  one  or  several  Ele¬ 
mentary  Low- weight  Subcodewords  (ELWSC).  By  definition, 
the  error  path  of  an  ELWSC  deviates  from  the  zero  state  only 
once  in  a  constituent  code.  An  ELWSC,  say  with  input  weight 
2,  is  referred  to  as  w2ELWSC.  The  weight  of  an  ELWSC  is 
less  than  Dmax .  This  implies  the  length  of  its  error  path  must 
be  less  than  a  limit  M .  We  define  M  as  the  span  of  ELWSCs. 
Special  treatments  are  given  to  input  weights  in  the  “tail” 
(or  last  L  bits)  of  the  input  sequence  to  account  for  the  large 
number  of  ELWSCs  with  these  input  weights. 

To  evaluate  the  weight  spectrum,  we  need  to  find  possi¬ 
ble  arrangements  (or  error  patterns)  of  input  weights  that 
result  in  low- weight  codewords.  For  example,  the  most  prob¬ 
able  input-weight  4  error  pattern  involving  bit  a  is  shown  in 
Fig.  1.  CC1  stands  for  constituent  code  1  and  CC2  for  con¬ 
stituent  code  2.  Input  bit  pairs  {a,  &,}  and  {d,di}  form  two 
w2ELWSCs  in  CC1.  In  CC2,  these  input  bits  swap  their  po¬ 
sitions  and  form  two  other  w2ELWSCs.  Note  that  subscripts 
are  used  for  6;,Ci,  and  d,  to  indicate  that  there  are  more  than 
one  set  of  input  bits  that  can  form  such  an  error  pattern  with 
bit  a.  The  search  for  these  bits  are  conducted  within  the 
span  of  ELWSCs.  For  example,  bi  is  searched  in  the  range 
(Ia  —  M,Ia  +  M )  where  Ia  is  the  index  of  bit  a  in  CC1. 

This  searching  process  is  applied  to  error  patterns  of  input- 
weight  up  to  6. 


m  -  | ...  bU^x^'CL.di . 

|  Interleaver  Function  | 

I  I  |  1 

CC2  ...  a  ...  ci . bi ...  di... 

Fig.  1:  Input  Weight  4  Error  Pattern. 


III.  Analysis  of  Different  Interleavers 

In  Fig.  2,  the  legends  stand  for:  I:  Uniform  Interleaver.  II: 
Modified  Block  Interleaver  with  the  prime  number  set  from  [1]. 
Ill:  Modified  Block  Interleaver  with  the  prime  number  set  se¬ 
lected  from  our  analysis.  IV:  Modified  S-pseudorandom  Inter¬ 
leaver  as  described  in  [3]  selected  from  our  analysis.  Over  100 
bit  errors  were  accumulated  for  each  simulation  point.  The 
union  bounds  plotted  are  calculated  using  the  weight  spec¬ 
trum  derived  from  our  analysis  which  is  performed  on  the 
rate-1/3,  (37,21)  turbo  code  with  interleaver  size  4096.  Due 
to  the  randomness  of  the  generating  process  of  interleavers, 
our  analysis  is  very  useful  in  .picking  out  the  “best”  one.  Also 
the  analysis  provides  an  approximation  of  the  error  floor. 
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We  consider  finite-alphabet  sequences  which  are  emitted  by 
a  stationary  source  with  unknown  statistics. 


X 

XT 


X1,X2,...,Xi,...; 
Xi,X2,...,Xm;  Xi  e  A;  |A|  =  A  . 


Assume  that  we  are  given  a  training  vector  YZj/  which  is 
governed  by  the  same  probability  law  that  governs  X,  but  is 
drawn  independently  of  X.  In  the  case  where  Fljv  =  w+1, 

(Sliding- window  case),  the  independence  assumption  is  essen¬ 
tially  replaced  by  the  assumption  the  source  is  a  finite-order 
.  Markov  source.  Given  FJ^,  we  need  to  estimate  P{X\  |XEt) 
(in  order  to  predict  X\  given  X°_t,  or  compress  X\  given  X°_t, 
etc.  in  cases  where  the  actual  measure  E(Xi|X°t)  is  not  avail¬ 
able  to  us). 

In  order  to  estimate  F(Xi|X®f)  one  constructs,  for  any 
training-sequence  Yz(r ,  some  empirical  conditional  probability 
measure  Q*  i(Xi|X°t)  of  Xi  given  X°(,  hoping  that  this 

-  N 

empirical  conditional  probability  measure  will  be  "close”  in 
some  sense  to  the  true  P(Xi|X®i). 

One  common  way  for  generating  such  an  empirical  measure 
is  to  evaluate  the  relative  frequency  of  appearance  of  each 
t  +  2  vector  Xl_t  in  YZ^,  and  use  it  to  generate  an  empirical 
probability  measure  for  t  +  2  vectors,  which  will  be  denoted  by 
qv- 1  (Xlt)  and  from  this  measure  to  generate  a  conditional 

probability  measure  Q*-i  (Xi|X°  t)=gy-i  (Xi|X°t)  for  any  t 

Y—  N  -N 

such  that  Xlt  appears  in  YJjv  at  least  once. 

For  example,  let  FJ^  =  0101100;  t  —  0,  XLt  =  01,Xlt  = 
Xq  =  X0  =  0.  Then,  qY-i  (01)  =  2/6;  qY- i  (00)  =  1/6; 

lYItim  =  yBre=VZ-  N 

For  X°t  that  do  not  appear  in  F_/)J,  we  may  set 
qY- 1  (Xi|X°t)  =  qY- 1  (Xi| X°_K  ),  where  X°_K  is  the  longest 

-N  -N 

suffix  of  Xlt  that  does  appear  in  Fj/,.  (Ko  is  defined  more 
precisely  below). 

But  is  this  choice  of  an  empirical  conditional  probabil¬ 
ity  measure  optimal  for  relatively  short  training  sequences? 
Our  aim  is  to  try  to  minimize  the  K-L  relative  entropy  (di¬ 
vergence)  between  the  true  P(Xi|X°f)  and  Q*-i  (Xi|X°t), 

-jv 


P(Xi|X®  ^ 

namely  E  log  t  (a^ix'0")’  where  E(-)  denotes  expectation 

-N 

with  respect  to  PfYZ^Xht)- 

In  this  presentation  we  are  treating  this  optimization  prob¬ 
lem  by  deriving  performance  bounds  for  a  restricted  class  of 
empirical  conditional  distributions  (predictors). 


1  Tliis  work  was  done  in  part  while  visiting  Lucent  Bell  Labora¬ 
tories 

2This  work  was  supported  by  the  Fund  for  the  Promotion  of 
Research  at  the  Technion 


Assumption  1  Let  us  define  a  random  variable  Ko  = 
K0(X°  t+1;F_^)  to  be  the  largest  integer  i  <  t  such  that 
=  YZi-j  for  some  1  <  j  <  N  —  i.  (Ko  —  0  if  X o  does  not 
appear  in  FT ^  )■ 

We  assume  that  the  discussion  is  limited  to  the  class  of 
empirical  conditional  probabilty  distributions  such  that,  for 
Ko  <  t, 

QtY-1(X1\X°_t)  =  QtY-1(Xi\X°Ko) 

(since  for  Ko  <  t  the  conditioning  is  on  an  event  X?.t  that  was 
never  observed  in  FT n:  on^D  s  suffix  X°Ko  was  observed  in 

Y-n)- 

Lemma  1  Under  Assumption  1  and  for  any  t  =  0, 1, 2, 3  •  •  • 

-Ey- i  logQ^-i  (Xi|X°t) 

-JV  -JV 

=  —Ey-\  logQy_i  (Xi|X2k0) 

-  JV  1  —N 

>  -Ey-i  \ogP(X1\X°Ko^)  =  HY-1(X1\X°_Ko_1). 

—  N  -N 

where  E„-i  (•)  denotes  conditional  expectation  given  the  value 

-TV 

ofYZk- 

If  F_/.  is  drawn  independently  of  X2-t,  we  have: 

— Elog  Qy-i  {Xi  |X°  t) 

N 

>  -E\ogP(X1\X°_Ko_1)  =  H{Xi\XlKo^) 

We  call  the  reader’s  attention  to  the  fact  that  in  the  "en¬ 
tropy”  expression  H(Xi\X<fKo_1)  Ko  is  a  r.v.  Furthermore, 
this  "entropy”  may  be  evaluated  only  if  the  probabilistic  char¬ 
acterization  of  the  source  is  available.  However,  it’s  useful¬ 
ness  stems  from  the  fact  that  it  is  demonstrated  that  there 
indeed  exist  universal  algorithms  for  generating  conditional 
empirical  measures  g*  i(Xi|X®t)  which  are  members  in  the 

YN 

admissible  class  that  is  defined  by  Assumption  1,  for  which 
-Elogg*  rpfilXM  is  close  to  H(Xx\X°_Ko^). 

It  should  be  pointed  out  that, 

H{Xx)  >  H(Xx |X°k0_x)  >  H(Xx jXV,) 
thus  demonstrating  the  non-asymptotic  effect  of  having  a 
"short”  training  sequence. 

While  these  imposed  restrictions  cure  apparently  intuitively 
satisfying,  they  also  lead  to  new  useful  non-asymptotic  bounds 
on  the  performance  of  universal  data  compression  algorithms 
such  as  CTW,  LZ  and  HZ  [1]  (where  similar  bounds  where 
drived  in  the  minimax  sense  only). 
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Abstract  —  The  sequential  decision  problem  is  stud¬ 
ied  for  loss  functions  with  memory  and  finite  action 
spaces.  Based  on  the  theory  of  Markov  decision  pro¬ 
cesses  (MDP’s),  off-line  reference  strategies  are  char¬ 
acterized.  An  infinite  horizon  on-line  strategy,  with 
corresponding  normalized  “regret”  which  is  upper- 
bounded  by  an  0(n~ 1//3)  term  for  an  arbitrary  individ¬ 
ual  sequence  of  observations  of  length  n,  is  derived. 


bt  —  [i(st,bt- i),  independent  of  Si  and  bo-  The  strategy  p  is 
obtained  as  a  solution  to  a  linear  program.  As  {p(x|s)}  varies, 
it  generates  a  finite  set  T  of  deterministic  off-line  reference 
strategies.  In  particular,  if  the  state  transitions  are  determin¬ 
istic  (e.g.,  if  l^l  =  1),  then  the  off-line  strategies  are  described 
in  terms  of  simple  cycles  with  minimum  average  weight  in  a 
graph  whose  nodes  are  in  S  x  B,  and  an  edge  from  (s,  b')  to 
(f(s),b)  has  a  weight  L(s,b',b),  where  s  transitions  to  f(s). 


I.  Introduction 

Consider  a  sequence  of  observations  xn  =  x\X2  •  •  •  xn  for 
which  corresponding  actions  bn  =  6162  •  •  •  bn  result  in  non¬ 
negative  instantaneous  losses  £(st,bt-i,bt,xt),  1  <  i  <  n, 
where  St  is  a  state  driven  by  st+i  =  f(st,xt)  in  a  finite  set 
5,  and  si  is  fixed.  The  action  space  B  is  assumed  finite,  and 
60  S  B  is  an  initial  action.  While  including  the  classical  “se¬ 
quential  decision  problem”  [1,  2],  for  which  the  loss  at  time 
t  is  independent  of  bt-i,  this  formulation  also  captures  cases 
where  there  is  a  cost  for  switching  between  actions,  or  a  long 
term  effect  (“memory”)  for  actions  taken  at  a  given  time.  Ex¬ 
tensions  to  longer  past  action  memories  are  straightforward. 

In  an  on-line  strategy,  bt  is  a  (possibly  random)  function  of 
xf_1  and  6<_1.  For  memoryless  loss  functions,  the  excess  loss 
accumulated  by  an  on-line  strategy  over  the  best  off-line  finite- 
state  (FS)  strategy  (i.e.,  one  in  which  bt  —  g{st),  where  g  is 
optimized  with  full  knowledge  of  xn)  is  termed  the  regret.  An 
on-line  randomized  strategy  is  demonstrated  in  [1]  for  |5|  =  1 
(see  [2]  for  S  >  1),  for  which  the  normalized  expected  regret 
vanishes  at  an  0(l/-*/n)  rate,  uniformly  over  {a:"}.  Here,  we 
present  an  analogous  result  for  loss  functions  with  memory. 

II.  The  reference  off-line  strategy 


III.  On-line  strategy 

The  design  of  an  on-line  strategy  is  actually  an  instance  of 
learning  with  expert  advice  [4],  where  T  is  a  set  of  /3  experts. 
However,  the  instantaneous  loss  of  a  strategy  that  follows  an 
expert  F  €  T  at  time  t  depends  on  bt-i,  which  may  not  agree 
with  F.  This  memory  calls  for  an  additional  block-length 
parameter  that  determines  how  long  the  advice  of  an  expert 
is  followed.  The  discrepancy  between  on-line  and  expert  losses 
at  the  start  of  each  block  is  amortized  over  the  block.  Our  on¬ 
line  strategy,  inspired  by  [4],  is  first  presented  for  the  horizon- 
dependent  case.  For  a  fixed  block  length  M,  at  t  =  Mk  -f 
1,  k  —  0, 1,  •  •  •,  we  randomly  select  F  according  to 


Pk(F\{CF,k},F  eF) 


exp{-r)CF,k} 

J2F,e:Fexp{-TjCF^k} 


where  t]  >  0  is  a  given  constant  and  CF,k  is  the  cumulative 
loss  of  F  through  time  t  =  Mk.  The  actions  of  F  are  followed 
through  t  =  Nk+ 1. 


Theorem  1  Let  M  =  2  (i^)‘/3  and  r)  = 
where  £ma.x  denotes  the  maximum  loss  £(s,b',b,x)  over  s  6  S, 
b',b  €  B,  and  x  €  A.  Then,  the  normalized  regret  of  the 
on-line  strategy  is  <  1.5fmax[(ln/3)/n]1,/3. 


For  memoryless  loss  functions,  reference  FS  deterministic 
strategies  are  justified  as  follows:  If  the  data  are  drawn  from 
an  FS  source  {p(x|s),s  S  S'}  (on  a  discrete  or  continuous  data 
space),  the  expected  (normalized)  loss  over  infinite  sequences 
is  minimized  (over  all  strategies  bt  =  pt(x<-J,f/-1))  by  the 
FS  strategy  g(s)  —  arg  min&gB  Ep£(s,  b,  x).  Similarly,  here, 
the  expected  loss  is  given  by 

1  n 

ZPlP  —  limsup  —  E  E  Pt(s,b',b)L(s,b',b) 

n->°°  n  t=l  ags,  b',b€B 

where  Pt(s,b',b)  is  the  joint  probability  (w.r.t.  (p(x|s)} 
and  {pt})  that  (st,bt-i,bt)  =  ( s,b',b ),  and  L(s,b' ,b)  = 
Ep£(s,b' ,b,x).  The  minimization  of  LPlP  over  {pt}  is  an  av¬ 
erage  cost  per  stage  problem  for  a  particular  MDP.  Assum¬ 
ing  that  {p(x|s)}  yields  an  irreducible  Markov  chain,  there 
is  [3,  Vol.  2,  Ch.  4]  a  deterministic  minimizing  strategy 

1Work  partially  done  while  visiting  at  HP  Labs. 

2Work  done  while  this  author  was  with  HP  Labs. 


For  infinite  horizon,  time  is  divided  into  exponentially  grow¬ 
ing  super-segments  of  sizes  {iV,},  in  each  of  which  the  above 
algorithm  is  used  with  Ni  replacing  n  in  the  specification  of 
M  and  77.  We  show  that  for  all  n,  the  normalized  regret  is 
bounded  as  in  Theorem  1,  but  with  a  larger  constant. 
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Abstract  —  The  asymptotically  optimal  methods  of 
prediction  for  Markov  sources  with  unknown  memory 
are  suggested.  The  methods  are  based  on  modified 
twice  universal  scheme. 

I.  Introduction 

The  problem  of  prediction  and  the  closely  related  problem 
of  adaptive  coding  of  time  series  is  well  known  in  Information 
Theory,  Probability  Theory  and  Statistics  [1]. 

We  consider  a  source  with  unknown  statistics  which  gen¬ 
erates  sequences  X1X2  ■  ■ ■  of  letters  from  a  finite  alphabet 
A  =  {ai,...,on}-  We  imagine  that  we  have  at  our  dis¬ 
posal  a  computer  for  solving  the  prediction  problem.  As 
input  we  consider  any  finite  string  xiX2...xt  of  letters 
from  A  and  as  output  we  receive  at  each  time  instant 
t  non-negative  numbers  p*(ai\xi . .  .xt),.  ■  ■  ,p*{an\xi .  ■  .xt) 
which  are  estimates  of  the  unknown  conditional  probabili¬ 
ties  p(ai\xi . . .  xt),  .  ■  ■  ,p(an\xi  . . .  xt),  i.e.,  of  the  probabilities 
p(xt+ i  =  Oi|xi  . . . xt );  i  =  1, . . .  ,n.  The  set  p*(o, |xi . .  .  xt); 
i  <  n  is  called  the  prediction. 

The  precision  of  a  prediction  method  is  measured  by  the 
divergence  between  p  and  p*  and  the  complexity  of  a  method  is 
characterized  by  two  numbers:  the  average  time  of  calculation 
at  each  time  instant  in  bit  operations  and  the  memory  size  in 
bits  of  the  program  defining  the  method.  Let  us  denote  the 
set  of  Markov  sources  of  memory  (or  connectivity)  fcasM*  (A) 
and  let  M0(A)  be  the  set  of  all  Bernoulli  sources. 

In  this  report  we  consider  the  prediction  problem  for 
Markov  sources  with  unknown  statistics  and  memory. 

II.  The  Main  Results 

We  will  use  two  asymptotically  optimal  prediction  meth¬ 
ods  for  Mi(A),i  —  0, 1, . . .,  which  were  suggested  in  [2].  The 
method  ati  is  asymptotically  optimal  in  average  and  Pi  with 
probability  one. 

According  to  twice  universal  scheme,  at  each  time  instant 
t  a  computer  compares  the  average  precision  of  all  methods 
/3o ,  Pi ,  •  •  ■ ,  Pn  on  the  interval  t  =  1,  2, . . . ,  T  —  1  and  finds 
jo  for  which  pj0  gives  the  best  precision  on  the  interval  t  = 
1,2,... ,  T  —  1.  Then  the  computer  uses  pj0  in  order  to  predict 
for  the  next  moment  T.  (It  looks  like  the  likelihood  principle). 

It  is  clear  that  the  computer  should  calculate  ( N  +  1)  pre¬ 
diction  sets  (for  Po,  Pi,  ■  ■  ■ ,  Pn)  instead  of  one  set  as  it  does  in 
case  of  known  memory  of  the  source.  So  the  time  of  calcula¬ 
tion  increases  (N  -I- 1)  times.  Similarly,  the  memory  space  of 
the  computer  should  be  divided  into  (N  + 1)  parts  in  order  to 
store  statistics  for  Po,  Pi , . . . ,  Pn  ■ 

1This  work  was  supported  by  RPBR  Grant  99-01-00586. 


The  new  methods  are  based  on  a  simplified  twice  univer¬ 
sal  scheme  (STUS).  According  to  STUS,  a  computer  which 
is  used  for  the  implementation  of  the  suggested  method  com¬ 
pares  two  methods  p^  and  Pi2  at  each  time  instant  t.  First, 
at  t  =  1,2,  ...,T  the  computer  compares  Po  and  Pi  which 
are  optimal  for  Mo(A)  and  Mi  (A)  (  T  is  a  parameter  of  the 
method).  Then  the  computer  removes  the  worst  method  and 
includes  P2  instead  of  it.  After  that  both  methods  are  com¬ 
pared  during  the  period  of  [T  +  1, . . . ,  2T],  the  worst  of  them 
is  removed  and  so  on.  At  each  time  instant  t  the  computer 
uses  the  best  method  pij  for  prediction.  (At  the  first  inter¬ 
val  [1, . . . ,  T]  po  is  used).  At  the  moment  (N  -I-  1)T  +  1  the 
computer  again  includes  Po  instead  of  removed  pij .  And  so 
on.  It  is  quite  obvious  that. the  computer  will  find  the  best 
P i  and  will  use  it  almost  all  time  for  prediction  if  T  is  quite 
large.  On  the  other  hand,  this  universal  scheme  is  fast  and 
space-efficient  because  at  every  moment  only  two  methods  are 
compared  instead  of  N  in  the  “conventional”  twice  universal 
scheme.  We  designate  this  method  as  P\tu  and  describe  two 
other  modifications. 

The  pltu  is  effective  with  probability  1.  We  obtain  the 
method  p2stu  which  is  simpler  if  the  computer  stops  to  look  for 
the  best  method  p]tu  after  the  moment  ( N  + 1  )T  and  uses  for 
prediction  at  the  moments  ( N  +  1)T  +  1,  (N  +  1)T2, ...  the  /3q 
which  was  the  best  during  [NT  +  1 , . . .  ,(N  +  1)T].  The  new 
method  p2tu  is  effective  in  average  only.  (For  simplification  of 
the  method  it  is  possible  to  use  optimal  in  average  aij  instead 
of  Pij).  The  last  modification  Pltu  may  be  used  when  N  is 
infinite  or  when  it  is  known  only  that  a  source  is  ergodic. 
The  method  P3stu  looks  like  p2tu  but  the  computer  includes 
randomly  chosen  method  Pi  from  the  Po,pi,--.  (Recall,  that 
Pi  is  included  instead  of  the  worst  method  Pij  at  the  moments 
T  +  1, 2T  +  1,  3T  +  1, ...). 

The  main  property  of  the  suggested  STUS  may  be  formu¬ 
lated  as  follows:  if  p\tu  is  used  with  T(r)  =  |" (log  y)  j  ,  where 

r  is  the  precision,  then  for  every  Mi(A )  its  precision  is  asymp¬ 
totically  equal  to  the  precision  of  the  method  which  is  optimal 
for  Mi  (A),  when  r  goes  to  0. 
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Abstract  —  The  prediction  and  probability  assign¬ 
ment  (PPA)  concept  is  important  in  lossless  image 
compression.  We  report  on  a  new  approximate  tech¬ 
nique  for  PPA  based  on  local  optimization. 

I.  Introduction 

The  aim  in  universal  lossless  data  compression  is  to 
achieve  a  performance,  in  terms  of  average  redundancy, 
that  asymptotically  fulfills  Rissanen’s  lower  bound  [1]  for 
universal  coding.  A  slightly  different  aim  is  to  minimize 
the  maximal  individual  redundancy  for  any  sequence. 
This  approach  is  well  studied  by  for  example  Shtarkov  [2]. 
An  important  difference  between  these  two  different  mea¬ 
surements  are  that  by  studying  individual  sequences  we 
get  a  tool  for  short  or  limited  sequences,  i.e. ,  we  may  get 
a  desired  behavior  from  the  first  symbol  to  the  last.  This 
difference  plays  an  important  roll  in  e.g.  lossless  image 
compression  where  the  data  is,  by  nature,  limited  to  the 
bounds  of  the  image. 

We  know  that  the  lower  bound  for  universal  data  com¬ 
pression  depends  not  only  on  the  length  of  the  sequence 
but  also  on  number  of  unknown  parameters,  K,  roughly 
like:  p(n )  k  y  logn.  Thus  it  is  the  aim  when  construct¬ 
ing  a  data  compression  scheme  for  practical  applications 
to  find  a  parameterization  of  the  source  with  a  minimal 
number  of  unknown  parameters  without  loosing  any  in¬ 
formation.  It  is  well  known,  in  the  lossless  image  com¬ 
pression  community,  that  (linear)  prediction  is  an  excel¬ 
lent  tool  for  such  reduction  of  the  number  of  unknown 
parameters.  Much  work  has  focused  on  different  strate¬ 
gies  for  universal  prediction  schemes.  These  prediction 
schemes  have  often  some  kind  of  connection  with  uni¬ 
versal  data  compression,  e.g.  [3].  Although  the  excellent 
results  in  the  area  the  application  in  lossless  image  com¬ 
pression  require  some  further  investigation  due  to  the  fact 
that  we  want  to  minimize  the  resulting  codeword  length 
which  may  be  a  different  goal  compared  to  minimizing 
the  error  from  the  prediction  scheme. 

In  the  way  the  data  is  treated  in  most  image  compres¬ 
sion  schemes  with  independent  prediction  and  probability 
assignment  (or  estimation)  we  cannot  guarantee  that  it 
is  possible  to  make  a  probability  assignment  that  has  an 
optimal  behavior  according  to  Rissanen’s  bound.  For  this 
reason  the  prediction  and  probability  assignment  (PPA) 
concept  was  introduced  in  [4].  The  aim  with  PPA  is  to 
optimize  the  prediction  and  the  probability  assignment 

°This  work  was  supported  by  TFR  project  271-98-244. 


together  in  order  to  control  the  behavior  of  the  redun¬ 
dancy  in  a  desired  way.  This  is  also  of  major  importance 
since  we  usually  use  some  kind  of  context  tree  model  for 
our  data  and  the  sequences  in  each  node  of  a  context  tree 
tends  to  be  very  small,  e.g.  less  than  100  samples,  except 
for  a  few  nodes  at  small  depth.  For  sequences  of  limited 
length  it  could  be  disastrous  to  use  a  universal  source 
coding  scheme  which  only  performs  asymptotically  cor¬ 
rect  and  have  an  non-optimal  initial  behavior. 

II.  The  approximate  PPA  algorithm 
From  a  theoretical  point  of  view  we  should  be  able 
to  construct  a  PPA  scheme  with  a  desired  behavior 
by  using  a  weighting  technique.  We  could  calculate 
the  weighted  block  probability  according  to:  Pw(-)  = 
fa  fe  a(a,  0)PB  (■,  a,  0)dad9,  where  Pb(-,  a,  9)  denotes  the 
block  probability  for  the  input  data  given  the  prediction 
parameters  a  and  the  probability  distribution  parameters 
9.  The  a()-function  sets  the  behavior  for  the  parameter 
description  costs,  i.e.,  the  redundancy  for  not  knowing 
the  parameters. 

For  practical  use  it  might  not  be  feasible  to  calculate 
or  to  find  a  closed  form  expression  for  the  block  proba¬ 
bility  PB{).  For  this  reason  we  use  the  local  optimization 
method  as  a  tool  since  it  will  be  possible  to  approximate 
the  block  probability.  The  precision  in  the  approxima¬ 
tion  will,  however,  influence  the  performance  of  the  re¬ 
dundancy. 

In  our  suggested  scheme  we  find  the  next  sym¬ 
bol  probability  distribution  according  to  Pi0(y)  = 
PB(^y)/T,iPB(xi)  where  the  max-probability  function 
is  determined  by  P£(x)  =  maxamax(?a(a,0)PB(x,  a,  9). 
For  the  Gaussian  probability  distribution  we  have  used 
an  approximate  distribution  function  and  then  simplified 
the  max-probability  function  further  by  finding  the  pa¬ 
rameter  a  by  a  least  square  criteria  followed  by  finding 
the  parameter  9,  i.e.,  individual  maximization.  Our  tests 
show  a  superior  redundancy  'performance  compared  to 
traditional  methods. 
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Abstract  -  In  this  short  contribution  we  present  some  novel 
results  about  the  reliable  information-rates  supported  by  point- 
to-point  multiple-antenna  Rayleigh-faded  wireless  links  for 
QAM  coded  data-transmissions.  After  deriving  the  (symmetric) 
capacity  of  these  links,  we  present  fast-computable  analytical 
upper  and  lower  bounds  that  are  asymptotically  exact  both  for 
high  and  low  SNR’s  and  give  rise  to  reliable  evaluation  of  the 
link-capacity.  The  proposed  bounds  apply  when  (perfect) 
Channel-State-Information  (CSI)  is  available  at  the  receiver 
and  allow  us  to  understand  clearly  the  ultimate  performance  of 
the  considered  multiple-antenna  QAM  systems.  Furthermore, 
asymptotically  exact  simple  upper  bounds  are  also  presented  for 
a  tight  evaluation  of  the  corresponding  outage  probability  when 
quasi-static  fading  occurs  and  coded  packet-transmission  with 
interleaving  is  used. 

Extended  summary 

The  growing  demand  for  high-throughput  wireless  services 
experienced  in  the  last  years  motivates  the  design  of  digital 
transmission  systems  able  to  convey  increasing  data-rates  without 
substantial  bandwidth-expansion.  At  the  present,  typical  cellular 
wireless  standards  support  data-services  at  about  9-10  kb/s  but, 
recently,  there  has  been  interest  in  providing  more  sophisticated 
services  at  ISDN-compatible  data-rates  exceeding  100  kb/s  using 
the  cellular  spectrum.  Since  the  wireless  channel  is  inherently  band- 
limited  by  multipath  phenomena,  bandwidth-efficient  coding  with 
diversity  constitutes  an  effective  means  in  coping  with  the 
deleterious  effects  of  fading.  Although  wireless  systems  with 
multiple  antennas  at  the  receiver  are  today  quite  common,  several 
important  contributions  [1,2,6]  have  recently  pointed  out  that  space- 
diversity  at  the  transmitter  can  give  rise  to  an  extraordinary 
improvement  in  the  reliable  rates  conveyable  by  wireless 
bandwidth-limited  links  when  CSI  is  available  at  the  receiver  and 
this  last  also  employs  space-diversity  (see  [8]  for  a  comprehensive 
reference  list  on  this  topic).  The  ultimate  reliable  throughput 
supported  by  point-to-point  Rayleigh-faded  links  with  multiple 
transmit/receive  antennas  has  been  evaluated  in  [1,2]  for  continuous 
Gaussian-shaped  coding  alphabets  and  it  has  been  found  to  scale 
linearly  with  the  number  of  the  transmit/receive  antennas,  becoming 
unbounded  for  large  SNR’s.  Motivated  by  these  promising 
information -theoretic  results,  several  coding  strategies  suitable  for 
actual  implementations  have  been  more  recently  presented  [3, 4, 5, 6]. 

Since  the  coded  systems  presented  in  the  contributions  provide 
data-transmissions  and  then  rely  on  finite-size  QAM-type 
constellations,  a  natural  question  that  is  still  unanswered  concerns 
the  reliable  rates  effectively  supported  by  multiple-antenna/point-to- 
point  wireless  systems  which  employ  finite-size  data-constellations 
and  are  peak-power  limited  (at  this  regard,  we  remark  that  in  [1,2] 
only  the  case  of  continuous  coding  alphabet  with  an  average  power- 
constraint  is  addressed).  In  this  contribution  we  attempt  to  give  an 
answer  to  this  question.  In  particular,  we  consider  a  point-to-point 
multiple-antenna  link  affected  by  flat  Rayleigh-distributed  fadings 


and  under  the  assumption  of  perfect  CSI  at  the  receiver  we  compute 
the  (symmetric)  Shannon  capacity  of  the  coded  channel  for  QAM 
transmissions.  Since  the  formula  for  the  capacity  resists  to  a  closed- 
form  evaluation  and  its  computation  requires  multiple  nested 
numerical  integrations,  we  present  some  fast-computable  upper  and 
lower  bounds  which  provide  reliable  (and  asymptotically  exact) 
evaluation  of  the  capacity.  In  addition,  the  proposed  bounds  also 
unveil  the  ultimate  performance  limits  of  peak-power-limited  QAM 
multiple-antenna  faded  links  and  point  out  the  impact  on  the 
capacity  of  some  important  system  parameters  such  as,  for  example, 
the  number  of  transmit/receive  antennas,  the  constellation  size  and 
the  employed  (average)  SNR. 

Finally,  since  actual  cellular  wireless  systems  may  be  impaired  by 
slow-variant  (i.e.,  quasi-static)  fading  that,  by  fact,  makes 
meaningless  the  link-capacity  [7,8],  in  the  last  part  of  this 
contribution  we  investigate  on  the  outage  probability  of  point-to- 
point  QAM  multiple-antenna  systems.  Being  the  latter  not 
analytically  computable  in  a  closed  form,  we  present  some  simple 
Chernoff-type  upper-bounds  which  are  asymptotically  exact  and  can 
be  utilized  in  practice  for  a  reliable  evaluation  of  the  actual  outages. 
In  addition,  these  bounds  directly  stress  the  impact  of  the  number  of 
employed  antennas  and  the  interleaving  depth  on  the  performance 
of  the  considered  QAM  systems  when  “block-fading”  phenomena 
affect  the  transmission  link  between  transmit/receive  antennas. 
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Abstract  —  The  problem  of  data  transmission  over 
an  unknown  channel  is  considered  and  an  approach  to 
code  design  for  joint  channel  estimation,  equalization 
and  error  correction  is  proposed.  In  contrast  to  most 
traditional  approaches,  where  the  receiver  is  designed 
given  knowledge  of  the  code  used  at  the  transmitter, 
this  paper  proposes  an  approach  where  the  code  is 
designed  based  on  knowledge  of  the  receiver  structure 
and  the  statistical  properties  of  the  channel. 

I.  System  Model 

Consider  one-shot  transmission  of  a  binary  block,  b  G  {±1}^, 
over  a  linear  filter  channel  using  binary  modulation.  Assume 
that  for  each  transmitted  block,  b,  a  complex  vector-valued 
output,  y  =  B'h  +  n  €  C1,  is  measured  at  the  receiver, 
where  n  G'Cl  is  zero-mean  complex  Gaussian  noise,  and  B  is 
a  matrix  containing  the  transmitted  bits;  (B)jj  =  (b);_J+1  for 
j  <  i  and  i  —  j  + 1  <  N,  and  (B);j  =  0  otherwise.  The  channel 
coefficients  h  G  Cp  (with  P  =  L  —  N  +  1,  assuming  P  <  N) 
are  drawn  from  a  complex-valued  Gaussian  distribution  and 
are  assumed  constant  over  the  transmission  of  one  block,  b, 
but  are  allowed  to  vary  between  blocks.  Furthermore,  it  is 
assumed  that  the  realization  of  h  is  unknown  both  at  the 
transmitter  and  at  the  receiver.  A  detailed  description  of  the 
system  and  the  assumptions  made  can  be  found  in  [1], 

Since  the  P  channel  coefficients  in  h  are  unknown,  the 
receiver  implements  joint  maximum  likelihood  (ML)  estima¬ 
tion  of  h  and  detection  of  the  transmitted  bits,  b,  that  is 
(h,b)  =  argminbeC  h€CP  ||y  -Bh||2.  Hence, 

b  =  My)  =  argmin  ||y  -  BB+y||2, 

bfcC 

where  C  C  { ±  1 } A’  is  the  set  of  allowed  codewords  and  B+  is 
the  pseudo-inverse  of  B.  The  mapping  b  :  CL  -A  {±1}W  is  the 
decoder  of  the  system.  The  operation  of  this  mapping  includes 
(implicit)  channel  identification.  The  decoder  output,  b,  is, 
however,  a  function  of  y  only ,  and  a  particular  received  vector 
is  always  mapped  into  the  same  b(y). 

II.  Code  Design  and  Performance 

The  problem  of  code  design  is  that  of  choosing  the  set  of  code¬ 
words,  C,  for  a  given  value  of  |C|  <  2N ,  such  that  the  word 
error  rate  (WER),  Pr(b(y)  ^  b),  is  minimized  without  ex¬ 
plicit  knowledge  of  the  channel.  Note  that  this  implies  that  the 
code  must  allow  for  both  estimation  of  the  channel  impulse  re¬ 
sponse,  as  well  as  providing  good  error  correcting  capabilities. 
That  is,  C  is  to  be  chosen  such  that  it  provides  an  optimal 
combination  of  redundancy  for  channel  estimation  (“training 
data”)  and  error  protection.  Finding  the  optimal  set  of  code 
words,  C,  is  a  integer  optimization  problem,  which,  in  general, 

1This  work  was  partially  funded  by  the  Swedish  Research  Coun¬ 
cil  for  Engineering  Sciences,  under  grant  271-99-194. 


is  very  hard  to  solve.  Therefore,  an  approach  based  on  sim¬ 
ulated  annealing  [2]  is  used  herein,  where  the  energy  of  the 
system  is  given  as  a  function  of  the  WER.  Unfortunately,  the 
WER  is,  in  general,  hard  to  derive  and  therefore  a  technique 
based  on  the  union  bound  is  used  instead.  The  union  bound 
gives  an  upper  bound  on  the  WER,  given  knowledge  of  the 
pairwise  error  probabilities.  These  can  be  calculated  using  a 
moment  generating  function  approach  and  closed  form  expres¬ 
sions  are  available  for  both  Rice  and  Rayleigh  channels  [1]. 

The  proposed  scheme  has  been  used  to  design  a  rate 
1°S2  |Cj/AT  =  1/2  code  for  a  channel  with  P  =  2  equally  strong 
Rayleigh  fading  paths.  Three  reference  cases  are  also  con¬ 
sidered:  The  first  scheme  uses  7  pilot  bits  for  least  squares 
channel  estimation,  Viterbi  equalization  and  hard  decoding  of 
a  (15,11)  Hamming  code,  resulting  in  an  overall  code  rate  of 
11/(15  +  7)  =  1/2.  The  second  scheme  is  identical  to  scheme 
one  except  that  the  equalizer  is  provided  with  genie  aided 
channel  estimates.  Finally,  the  third  reference  scheme  uses 
optimal  ML  decoding  of  the  overall  code  defined  by  concate¬ 
nating  the  7  pilot  bits  and  the  Hamming  (15,11)  code  [1], 


As  can  be  seen  in  the  figure,  the  proposed  coding  approach 
significantly  outperforms  the  other  cases,  clearly  illustrating 
the  performance  benefit  of  designing  the  code  for  joint  channel 
estimation  and  error  protection.  Furthermore,  in  [1]  it  is  illus¬ 
trated  that  the  new  scheme  is  quite  insensitive  to  mismatch 
in  the  design  parameters  compared  with  their  true  values. 
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Abstract  —  We  give  the  optimal  4-  and  8-state  trel¬ 
lises  for  across-the-subchannels  TCM  for  DMT  sys¬ 
tems. 

I.  Introduction 

TCM  can  be  performed  for  DMT  systems  in  two  ways  : 
coding  parallely  and  coding  across  the  subchannels.  The  de¬ 
coding  delay  in  the  latter  case  is  M  times  less  than  that  in 
the  former  case,  where  M  is  the  number  of  subchannels  [1], 
We  refer  the  latter  as  across-the-subchannels  TCM  for  DMT 
systems. 

At  the  receiever  input,  the  SNR’s  in  different  subchannels 
are  different  due  to  the  channel  impulse  response.  Thus,  the 
minimum  weighted  Euclidean  distance  becomes  the  decision 
criteria  for  ML  decoding,  and  hence  we  use  weighted  Viterbi 
decoding.  Due  to  this  weighting,  the  best  trellis  known  for 
single  carrier  systems  need  not  be  the  best  in  our  case. 

II.  Classification  of  Trellises 

We  classify  all  the  5-state  trellises  into  7  classes  (where  7 
=  log25)  as  {5*2  ;p*  :  1  <  x  <  7},  where  5*2  ’p*  denotes  an 
5-state  trellis  with  a  node  at  a  level  connected  to  2X  nodes  in 
the  next  level  and  having  2P  parallel  transitions.  We  label  the 
top  most  node  as  so  and  the  last  node  as  S27_i. 

Definition  1  :  A  cyclic  trellis  is  a  trellis  in  which  the  branches 
diverging  from  a  node  s„  at  any  level  connect  to  2b-p  nodes 
of  the  next  level,  beginning  from  S(n.2i>-p)  mod  2y  and  ending 
at  S((n+1).2i>— P  _j)  mod  2T,  where  b  is  the  number  of  input  bits 
per  symbol. 


(a)  (h)  (c)  (d) 


Figure  1:  Some  possible  4-state  trellises  :  (a)  4*2,0'  non-cyclic 
(b)  4(2'0^  cyclic  (c)  4*2;F  cyclic  (d)  4*4’0*  cyclic 

Definition  2  :  The  Convergence  length  of  a  trellis  is  defined 
as  the  minimum  of  all  lengths  of  pairs  of  paths  that  diverge 
from  a  node,  excepting  the  parallel  transitions,  and  converge 
at  another  node. 

'This  work  was  partly  supported  by  CSIR,  India,  through  Re¬ 
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The  upper  bound  on  the  convergence  length  of  a  trellis  is  given 
by  [2] 

Lmax  =  LP-J  T  1 
0 1 

where  b\  refers  to  that  part  of  the  input  bits  which  affects  the 
state  of  encoder  and  [xj  denotes  the  largest  integer  less  than 
or  equal  to  x. 

Theorem  1  :  The  convergence  length  of  a  cyclic  trellis  is 
equal  to  Lmax,  i.e.,  cyclic  trellises  achieve  the  upper  bound  on 
the  convergence  length. 

III.  Optimal  4-  and  8-State  Trellises 

Let  bmin  =  min,e[o,Af-i]{hi},  where  6,  is  the  number  of  input 
bits  in  ith  subchannel  and  stwt  =  min,e[0 where 
s,  and  Wi  are  the  squared  miniumum  Euclidean  distance  of 
the  ith  subchannel  symbol  constellation  and  weighting  factor 
for  that  subchannel,  respectively. 

Theorem  2  :  The  best  trellis  for  4-state  across-the- 
subchannels  TCM  is 

(a)  the  4*2;0*  cyclic  trellis,  for  bmin  —  1) 

(b)  the  qW6”*'" -2)  cyclic  trellis  if 

min  {2sjU),  +  Si@im.ei}  >  4stiut 
ie[o,M-i] 

else  the  4(2;i,">>"-1)  cyclic  trellis,  for  bmtn  >  2. 

Theorem  3  :  The  best  trellis  for  8-state  across-the- 
subchannels  TCM  is 

(a)  the  8*2;0*  cyclic  trellis,  for  &mtn  — 

(b)  the  8*4’0*  cyclic  trellis,  for  frmtn  —  2, 

(c)  the  8(8:i>m,n~3)  cyclic  trellis  if 

min  {2siWi  +  SkWk}  >  8 stwt 

i,fc€[0,M-l] 

else  the  8l4;i,™"»~2)  cyclic  trellis,  for  bmtn  >  3. 
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Abstract  —  In  this  paper,  we  consider  noncoherent 
communication  over  a  frequency  nonselective  chan¬ 
nel.  Results  from  coherent  coding  theory  are  used 
to  devise  both  low  and  high  rate  codes  for  noncoher¬ 
ent  systems.  Further,  one-dimensional  noncoherent 
codes  with  good  Hamming  distance  properties  can  be 
transformed  into  space-time  noncoherent  codes  which 
achieve  full  transmit  diversity  using  a  block  transfor¬ 
mation. 

I.  Introduction 

Noncoherent  transmission  is  considered  over  a  frequency  non¬ 
selective  channel.  The  channel  gain  is  assumed  to  be  un¬ 
known  but  piecewise  constant  over  a  length  of  time  called 
the  coherence  interval  (and  denoted  by  N),  which  lasts  sev¬ 
eral  symbol  durations.  In  prior  work  [1],  a  noncoherent  “dis¬ 
tance”  was  identified  as  a  performance  measure  for  noncoher¬ 
ent  codes,  analogous  to  the  Euclidean  distance  in  the  coherent 
case.  Also,  a  near-optimal  algorithm  of  linear  complexity  was 
found  for  noncoherent  demodulation.  This  work  considers  the 
design  of  one-dimensional  and  space-time  codes  for  noncoher¬ 
ent  channels,  with  a  focus  on  adapting  simple  coherent  codes 
for  the  noncoherent  setting. 

II.  One-dimensional  noncoherent  codes 

Our  results  so  far  indicate  that  the  vast  body  of  knowledge 
regarding  coherent  codes  can  be  leveraged,  with  appropriate 
modifications,  to  obtain  noncoherent  codes.  First,  the  low  rate 
case  is  considered.  A  noncoherent  code  <Snc  can  be  obtained 
from  a  linear  binary  code  S  containing  the  all  ones  codeword  as 
the  set  of  equivalence  classes  of  <S,  where  an  equivalence  class 
consists  of  a  vector  in  S  and  its  complement.  In  this  case,  the 
minimum  noncoherent  distance  of  5„c,  as  formulated  in  [1], 
can  be  shown  to  be  proportional  to  the  minimum  Hamming 
distance  of  S.  Hence,  the  choice  of  a  good  coherent  linear 
binary  code  for  S  yields  a  good  low-rate  noncoherent  code  5nc. 
In  particular,  the  (7,4,3)  Hamming  code  yields  an  optimal 
set  of  8  vectors  of  length  N  —  7  on  a  unit  sphere,  for  the 
noncoherent  setting. 

For  the  high  rate  case,  multilevel  coding  can  be  employed  to 
yield  good  noncoherent  codes.  Varying  degrees  of  protection 
are  provided  to  each  bit  position  in  the  bit  labeling  of  sym¬ 
bols,  using  stronger  or  weaker  codes.  The  linear  complexity 
algorithm  for  the  uncoded  case  can  be  extended  to  the  multi¬ 
level  coding  case,  resulting  in  a  low-complexity  demodulation 
algorithm.  Simulation  results  show  that  a  (7,4,3)  Hamming 
code  applied  to  the  least  significant  bit  of  an  8-PSK  alphabet 
with  Ungerboeck-set  partitioning  gives  a  performance  1.5  dB 
better  than  8-QAM. 

'This  work  was  supported  by  the  National  Science  Foundation 
under  a  CAREER  award  NSF  NCR96-24008CAR. 


III.  Space-time  codes 

A  space-time  code  consists  of  matrices  of  size  N  x  Nt  where  Nt 
is  the  number  of  transmitter  antennae  (known  as  space-time 
codewords)  where  the  tth  column  denotes  the  symbols  trans¬ 
mitted  over  antenna  i  from  time  1  to  IV.  A  common  design 
goal  for  space-time  codes  is  to  achieve  full  diversity,  which  im¬ 
plies  that  the  symbol  error  probability  decays  asymptotically 
as  1/SNR  v‘ ,  where  SNR  denotes  the  signal-to-noise  ratio  and 
it  is  assumed  that  N  >  Nt. 

In  the  noncoherent  case,  full  diversity  gain  can  be  shown  to 
be  achieved  by  a  code,  if  for  every  pair  of  codewords  4>,  0,  the 
matrix  (  $  0  —  $  )  has  full  column  rank.  In  comparison, 

full  coherent  diversity  gain  is  achieved  if  0-4>  has  full  column 
rank.  Thus,  the  following  remark  holds. 

Remark 

A  space-time  code  that  achieves  full  diversity  in  the  nonco¬ 
herent  case  also  achieves  full  diversity  in  the  coherent  case, 
although  the  converse  does  not  hold. 

Space-time  codes  that  achieve  full  noncoherent  diversity  gain 
can  be  derived  from  one-dimensional  noncoherent  codes,  as  a 
result  of  the  following  theorem. 

Theorem 

Consider  a  code  C  such  that,  for  every  codeword  c  = 
(co,ci, . . .  ,cn-i  )T  inC,  |ci|  =  Vi  —  0,  1,  and  a 

noncoherent  space-time  code  <S,IC  whose  codewords  are  derived 
from  C  as 


Co 

Co 

Co 

Cl 

Cl  z 

Cl  zN'- 

-1 

ON- 2 

CN-2ZN~2  . 

..  CN-2{zNt~ 

1  ^N-2 

CN-  1 

CN-1ZN~'  . 

..  cN-i(zNt~ 

l^N-1 

where  z  =  exp  (jjf)  is  an  Nlh  root  of  unity.  Then,  S„c 
achieves  full  noncoherent  diversity  gain  if  and  only  if  Nt  < 
N/2  and  the  Hamming  distance  dn  of  C  satisfies  Nt  <  djj  < 
(N  —  Nt). 

The  preceding  link  between  one-dimensional  and  space-time 
codes  enables  us  to  exploit  constructions  for  one-dimensional 
noncoherent  codes  (e.g.,  the  multilevel  codes  of  Section  II)  for 
the  design  of  space-time  noncoherent  codes.  The  interested 
reader  is  referred  to  [2]  for  details. 
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Abstract  — 

This  paper  establishes  new  criteria  for  stability  and 
for  instability  of  multiclass  network  models  under  a 
given  sequencing  or  routing  policy.  It  also  extend- 
s  previous  results  on  the  approximation  of  the  solu¬ 
tion  to  the  average  cost  optimality  equations  through 
an  associated  fluid  model:  It  is  shown  that  an  op¬ 
timized  network  possesses  a  fluid  limit  model  which 
is  itself  optimal  with  respect  to  a  total  cost  crite¬ 
rion.  A  full  version  of  the  paper  is  available  at 
http : / /black . csl . uiuc . edu : 80/~meyn. 

I.  Introduction 

A  traditional  academic  approach  to  scheduling  and  rout¬ 
ing  is  to  construct  a  Markov  decision  process  model  for  the 
network.  This  involves  constructing  a  controlled  transition 
operator  Pa{x,  y),  which  gives  the  probability  of  moving  from 
state  x  to  state  y  when  the  control  decision  a  is  applied.  The 
state  space  X  where  x  and  y  live  are  typically  taken  as  the 
set  of  all  possible  buffer  levels  at  the  various  stations  in  the 
network. 

Given  an  MDP  model,  and  a  one  step  cost  function  c:  X  — > 
R+,  a  solution  to  the  average  cost  optimal  control  problem 
is  found  by  solving  the  resulting  dynamic  programming  equa¬ 
tions.  The  difficulty  with  this  approach  is  very  well  known: 
When  buffers  are  infinite,  this  becomes  an  infinite  dimensional 
optimization  problem.  Even  when  considering  finite  buffers, 
the  complexity  grows  exponentially  with  the  dimension  of  the 
state  space.  Hence  some  form  of  aggregation  is  necessary  - 
the  Markovian  model  is  simply  too  detailed  to  be  useful  in 
optimization. 

An  elegant  approach  is  to  consider  the  model  in  heavy  traf¬ 
fic  where  a  reflected  Brownian  motion  model  is  appropriate. 
The  paper  [2],  and  many  others,  develop  these  ideas  for  the 
network  scheduling  or  sequencing  problems.  One  is  then  faced 
with  optimizing  a  controlled  stochastic  differential  equation 
(SDE)  model. 

This  paper  builds  upon  the  results  of  [5,  1].  We  develop 
a  general  framework  for  constructing  control  algorithms  for 
multiclass  queueing  networks  based  on  a  fluid  model.  Network 
sequencing  and  routing  problems  are  considered  as  special  cas¬ 
es.  The  following  aspects  of  the  resulting  feedback  regulation 
policies  are  developed  in  the  paper: 

(i)  The  policies  are  stabilizing,  and  are  in  fact  geometrically 
ergodic  for  a  Markovian  model. 

HVork  supported  in  part  by  NSF  Grants  ECS  9403742,  ECS  99 
72957. 


(ii)  Numerical  examples  are  given.  In  each  case  it  is  shown 

that  the  feedback  regulation  policy  closely  resembles  the 
average-cost  optimal  policy. 

(iii)  A  method  is  proposed  for  reducing  variance  in  simula¬ 
tion  for  a  network  controlled  using  a  feedback  regulation 
policy. 

The  viewpoint  arrived  at  in  this  paper  leads  to  policies 
which  are  similar  to  those  found  through  a  heavy  traffic  anal¬ 
ysis  using  a  Brownian  motion  approximation.  In  all  of  the  net¬ 
work  models  which  have  been  considered  to  date,  one  could 
perform  designs  on  the  fluid  model,  translate  these  policies 
as  described  in  the  paper,  and  arrive  at  the  same  policy  that 
was  obtained  using  a  Brownian  motion  approximation.  Giv¬ 
en  the  greater  complexity  of  the  Brownian  motion  model,  we 
conclude. that  while  diffusion  approximations  are  tremendous¬ 
ly  useful  for  analysis,  they  appear  to  be  less  useful  for  the 
puposes  of  control  design. 
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Abstract  —  Although  of  practical  importance  to 
managing  large  IP  networks,  measurement-based  net¬ 
work  monitoring  using  distributed  monitors  has  not 
been  rigorously  formulated  nor  investigated.  This 
work  develops  a  missing  data  framework  for  dis¬ 
tributed  monitoring  based  on  multicast,  and  inves¬ 
tigates,  through  density  estimation,  how  resources 
needed  for  network  monitoring  scale  with  the  size 
of  the  network  under  various  network  (loss)  condi¬ 
tions.  The  results  on  the  scalability  provide  insights 
into  feasibility  of  using  only  edge  monitors,  and  pro¬ 
vide  design  guidelines  for  future  network  management 
systems. 


I.  Missing  Data  Formulation 

To  assist  network  managers  in  monitoring  large  and  hetero¬ 
geneous  networks  in  dynamic  environments,  network  monitors 
can  be  allocated  at  either  the  interior  or  the  edges  of  a  man¬ 
aged  network  to  monitor  Quality  of  Service  (QoS)  measures 
such  as  packet  loss  or  delay.  Even  if  network  monitors  are 
deployed  everywhere  in  the  network,  some  of  them  may  be 
occasionally  inaccessible  for  various  reasons.  Hence,  a  gen¬ 
eral  formulation  of  network  monitoring  should  consider  this 
missing  information  aspect. 

We  have  developed  a  general  theoretical  framework  for  net¬ 
work  monitoring  using  distributed  monitors  based  on  missing 
data  formulation  [3],  where  (a  set  ( U )  of)  missing  variables 
correspond  to  unobservable  network  nodes  where  monitors  are 
neither  available  nor  accessible,  and  (a  set  (O)  of)  observable 
variables  correspond  to  nodes  with  functional  monitors.  Our 
model  is  in  the  form  of  the  complete  likelihood  on  both  observ¬ 
able  and  missing  variables.  We  consider  network  monitoring 
in  the  context  of  multicast  probing  [2],  where  network  mon¬ 
itors  measure  the  number  of  probe  packets  lost  at  the  nodes 
of  a  multicast  tree.  Define  the  state  Xj  of  node  j  to  be  a 
binary  random  variable,  where  Xj  —  1  if  node  j  receives  a 
probe  packet,  and  Xj  =  0,  otherwise.  The  resulting  complete 
likelihood  function  possesses  a  very  simple  analytical  form 
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Pr(Xj  =  Xj,Vj)  =  JJ{a;/(')Ij[(l-Q j)Cjp<fi-’™>*>},  (1) 
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The  estimation  error  between  the  true  (a*’s)  and  estimated 
parameters  (dj’s)  given  measurements  D0b.i  (losses  measured 
by  monitors)  can  be  related  to  the  convergence  rate  as 
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where  e  is  an  error  term  depending  on  the  missing  informa¬ 
tion,  a]  corresponds  to  the  complete  information  for  the  j-th 
unobservable  node[3],  is  the  convergence  rate  of  the  j-th 
EM  equation  and  n  is  the  number  of  probes. 

Using  the  theory  of  density  estimation[l],  we  define  the 
scalability  of  measurement-based  network  monitoring  in  terms 
of  how  the  estimation  error  and  the  convergence  rate  vary  with 
respect  to  the  number  of  probes  and  the  size  of  a  multicast 
tree  under  various  network  conditions.  For  a  uniform  mul¬ 
ticast  tree1  with  small  packet  loss  (Qj  =  1  —  o(l),Vj)  and 
assuming  only  edge  monitors,  the  estimation  error  is  O(^) 
with  M  being  the  total  number  of  unobservable  nodes,  and 
the  convergence  rate  \3  —  1  —  o(l),Vj.  This  corresponds 
to  the  best  achievable  scalability  suggested  by  density  esti¬ 
mation.  When  packet  losses  are  large  across  the  multicast 
tree  (a j  =  o(l),Vj),  the  estimation  error  is  0(-jp-^-)  with 

0  <  0  <  1,  and  Xj  =  o(l),Vj.  This  corresponds  to  the  worst 
scalability  with  an  exponentially  large  number  of  probes  in  the 
depth  of  a  multicast  tree,  and  an  exponentially  slow  conver¬ 
gence  rate.  When  large  losses  occur  locally,  properly  allocated 
distributed  monitors  can  improve  the  scalability  to  the  best 
achievable. 
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where  the  parameter  otj  =  Pr(Aj  —  1  |  Xj (j)  =  1)  with  node 
f(j)  being  the  parent  of  node  j,  Xj  equals  to  0  or  1,  L  is 
the  depth  of  a  multicast  tree,  and  Cj  is  quantity  which  does 
not  depend  on  cr/s.  As  such  a  model  belongs  to  an  expo¬ 
nential  parametric  family,  it  results  in  a  simple  Expectation- 
Maximization  algorithm  to  estimate  the  unknown  parameters, 
Qj’ s,  corresponding  to  unobservable  nodes. 

II.  Scalability  Analysis 
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Abstract  —  In  a  computer  network,  the  traffic  ma¬ 
trix  or  the  origin-destination  (OD)  byte  counts  are 
important  statistics  needed  for  design,  routing,  con¬ 
figuration  debugging,  monitoring  and  pricing.  How¬ 
ever,  they  are  not  easily  available.  For  a  fixed  routing 
scheme,  a  statistical  inverse  algorithm  is  proposed  and 
validated  to  estimate  the  traffic  matrix  from  the  eas¬ 
ily  collectable  link  counts  which  are  aggregations  of 
the  origin-destination  counts. 

I.  Introduction 

Practical  realities  dictate  that  information  needed  for  man¬ 
aging  computer  networks  is  sometimes  best  obtained  through 
estimation.  This  is  true  even  though  exact  measurements 
could  be  made  by  deploying  specialized  hardware  and  soft¬ 
ware.  We  consider  estimation  of  origin-destination  byte  counts 
from  measurements  of  byte  counts  on  network  links.  All  com¬ 
mercial  routers  can  report  their  link  counts  through  the  Simple 
Network  Management  Protocol  (SNMP),  whereas  measuring 
complete  OD  counts  on  a  network  is  far  from  routine.  The 
problem  of  estimating  the  OD  byte  counts  from  aggregated 
byte  counts  measured  on  links  is  called  network  tomography 
by  Vardi  [1].  The  similarity  to  conventional  tomography  lies 
in  the  fact  that  the  observed  link  counts  are  linear  transforms 
of  unobserved  OD  counts  with  a  known  transform  matrix  de¬ 
termined  by  the  routing  scheme. 

II.  A  Moving  IID  Gaussian  Model  with  a 
Mean- Variance  Relationship 

We  [2]  study  the  inference  of  OD  byte  counts  from  link  byte 
counts  measured  at  router  interfaces  under  a  fixed  routing 
scheme.  A  basic  model  of  the  OD  counts  assumes  that  they 
are  independent  normal  over  OD  pairs  and  iid  over  successive 
measurement  periods.  The  normal  means  and  variances  are 
functionally  related  through  a  power  law.  We  deal  with  the 
time- varying  nature  of  the  counts  by  fitting  the  basic  iid  model 
locally  using  a  moving  data  window.  Identifiability  of  the 
model  is  proved  for  router  link  data  and  maximum  likelihood  is 
used  for  parameter  estimation.  The  OD  counts  are  estimated 
by  their  conditional  expectations  given  the  link  counts  and 
estimated  parameters.  OD  estimates  are  forced  to  be  positive 
and  to  harmonize  with  the  link  count  measurements  and  the 
routing  scheme. 

Simple  local  likelihood  fitting  of  an  iid  model  is  not  suffi¬ 
cient  because  large  fitting  windows  over-smooth  sharp  changes 
in  OD  traffic,  while  a  small  windows  cause  estimates  to  be  un¬ 
reliable.  A  refinement  in  which  the  logs  of  positive  parameters 
are  modeled  as  random  walks,  penalizes  the  local  likelihood 
surface  enough  to  induce  smoothness  in  parameter  estimates 
while  not  unduly  compromising  their  ability  to  conform  to 
sharp  changes  in  traffic.  We  use  a  fully  normal  approximation 

1  Bin  Yu  is  on  leave  from  University  of  California,  Berkeley. 


to  this  approach  and  demonstrated  how  effectively  it  recovers 
OD  byte  counts  for  our  chosen  network. 

III.  Validation  with  real  data 

The  proposed  method  is  applied  to  two  simple  networks 
at  Lucent  Technologies.  OD  counts  are  shown  to  be  recov¬ 
ered  with  good  accuracy  relative  to  the  degree  of  ambiguity 
that  remains  after  marginal  and  positivity  constraints  are  met. 
Furthermore,  the  estimates  are  validated  in  a  single-router 
network  for  which  direct  measurements  of  origin-destination 
counts  are  available  through  special  software. 

IV.  A  SCALABLE  ALGORITHM  FOR  LARGE  NETWORKS 
It  can  be  seen  that  for  a  network  of  n  origins  (destinations), 
the  computational  cost  of  our  proposed  method  will  be  at  least 
of  order  0(n5)  even  after  taking  advantage  of  sparse  matrix 
computation.  Even  for  a  network  of  a  moderate  size  (e.g. 
n  =  100),  this  is  not  acceptable. 

Since  the  OD  counts  come  with  an  estimation  accuracy, 
the  optimization  problem  in  our  method  does  not  have  to  be 
solved  exactly.  This  suggests  that  we  could  choose  subprob¬ 
lems  of  smaller  size  to  apply  our  method  so  that  the  estima¬ 
tion  accuracy  remains  the  same  order  of  magnitude  as  the  full 
problem  but  the  computational  cost  is  greatly  reduced. 

A  divide-and-conquer  scalable  algorithm  has  been  devised 
based  on  the  principle  of  local  information  -  most  of  the  in¬ 
formation  in  estimating  the  parameters  of  a  particular  OD 
random  variable  comes  from  links  nearby.  Uner  this  princi¬ 
ple,  the  OD  pairs  are  clustered  into  groups,  and  for  each  group 
of  OD  pairs,  links  are  selected.  For  each  subproblem  of  an  OD 
group  and  associated  links,  a  parameter  reduction  is  carried 
out  to  minimize  the  computational  cost  so  that  the  compu¬ 
tation  cost  of  the  algorithm  is  of  0(n3).  This  algorithm  can 
be  used  on  its  own  or  to  find  an  initial  estimate  for  the  full 
problem. 

We  are  currently  implementing  this  algorithm  on  a  large 
Lucent  network.  The  results  will  be  compared  against  those 
using  the  full  approach. 
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Abstract  —  A  methodology  to  build  interval- valued 
probability  models  is  presented.  It  is  shown  that 
this  alternative  produces  temporally  stable  models  of 
Internet-generated  communications  variables. 

I.  Introduction 

Experience  with  such  Internet-generated  communications 
variables  as  files  sizes  and  packet  delays  suggests  that  they 
have  variable  statistical  characteristics  even  when  these  char¬ 
acteristics  are  estimated  from  huge  sample  sizes.  This  vari¬ 
ation  is  observed  over  medium  duration  (months)  time  peri¬ 
ods  and  between  sources  of  similar  types.  Thus  the  observed 
variations  in  the  parameters  of  long-range  dependent,  heavy 
tailed  models  suggests  the  need  for  another>dass  of  math¬ 
ematical  models  that  can  account  for  the  observed  common 
semi-quantitative  behavior  in  a  medium-range  temporally  sta¬ 
ble  manner. 

To  this  end  we  turn  to  the  foundations  of  probability  for 
the  class  of  interval-valued  probabilities  and  more  specifically 
to  the  subclass  of  upper  and  lower  envelopes  (an  introduction 
can  be  found,  e.g.,  in  [2]  and  references  therein).  By  doing  so 
we  will  give  up  some  of  the  ability  of  the  standard  probability 
models  to  describe  detailed  dynamics  of  the  traffic  variables 
in  exchange  for  a  more  robust,  temporally  stable  stochastic 
model. 

II.  Modeling:  minimal  extension,  lower 

ENVELOPES 

We  base  our  construction  in  the  following  concept  (see  Sadrol- 
hefazi  and  Fine  [2]): 

Definition.  A  kernel  (1C,  p)  is  a  pair  with  K.  a  collection  of 
subsets  of  a  set  Q,  that  includes  0  and  fl  ,  and  a  set  function  p 
defined  on  K.  satisfying  the  following  four  modified  axioms  of 
a  lower  probability  (i)  p(ft)  =  1;  (ii)  (VA  £  K)p(A)  >  0;  (hi) 
(VA,B  €  K.)  AOB  =  0  =*■  sup{p(C)  :  C  €  1C,  C  C  A  U  B}  > 
p(A)+p(B)  (superadditivity);  (iv)  (VA,B  €  K.)  l4-sup{p(C)  : 
C  €  1C,C  C  AB]  >  p(A)  +  p(B)  (conjugacy). 

Theorem.  Given  a  set  Q,  a  kernel  (1C,  p),  and  any  algebra 
A  D  K.,  there  is  a  unique  minimal  extension  of  p  to  a  lower 
probability  P  on  A,  such  that:  (i)  P  agrees  with  p  on  1C;  (ii) 
if  Q  is  any  other  lower  probability  on  A  agreeing  with  p  on  1C, 
then  (VA  €  A)Q(A)  >  P(A). 

By  partitioning  the  range  of  file  sizes  we  find  that  several 
of  the  intervals  of  size  contain  many  files,  and  therefore  we 
are  confident  that  a  relative  frequency  estimate  of  their  prob¬ 
ability  will  have  high  accuracy.  These  events  then  lie  in  1C. 
In  our  case,  we  have  frequentist-based  probabilities  of  file  size 
estimated  from  a  variety  of  servers.  We  generate  the  kernel 

'This  work  was  conducted  with  partial  support  from  NSF  Grant 
NCR-9725251. 
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Figure  1:  IVP  model  based  on  the  1993  Survey  accounts  well 
for  the  1999  data 

p  for  the  events  by  taking  the  minima  of  the  individual  esti¬ 
mated  probabilities.  This  process  is  guaranteed  to  generate  a 
function  p  satisfying  the  definition  of  a  kernel.  We  can  then 
proceed  to  use  minimal  extension  to  complete  the  kernel  to  a 
lower  envelope  P. 

III.  Application  to  Modeling  Unix  File  Size 
Data  Sets 

Our  data  on  Unix  file  sizes  comes  from  two  surveys: 

1.  An  extensive  survey  was  conducted  in  1993  by  Irlam 
([1]):  over  1,000  file  systems  of  different  organizations 
were  surveyed,  representing  roughly  250  gigabytes  dis¬ 
tributed  in  approximately  12  million  files. 

2.  A  smaller  survey  was  conducted  in  August  1999  on 
two  Unix  file  servers,  THOR  and  TITAN,  of  the  Cor¬ 
nell  Electrical  Engineering  department  network,  with 
roughly  1  million  files,  totalling  46.5  gigabytes. 

Table  1  reveals  some  significant  differences  between  the  dis¬ 
tributions  of  file  sizes  in  both  cases.  However,  as  can  be  seen 
in  Figure  1,  an  IVP  model  built  as  explained  in  the  previous 
section  and  based  on  the  1993  Survey  accounts  well  for  the 
1999  data. 
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Abstract  —  An  approach  to  the  problem  of  linear 
prediction  is  discussed  that  is  based  on  recent  devel¬ 
opments  in  the  universal  coding  and  computational 
learning  theory  literature.  This  development  provides 
a  novel  perspective  on  the  adaptive  filtering  problem, 
and  represents  a  significant  departure  from  traditional 
adaptive  filtering  methodologies.  In  this  context,  we 
demonstrate  a  sequential  algorithm  for  linear  predic¬ 
tion  whose  accumulated  squared  prediction  error,  for 
every  possible  sequence,  is  asymptotically  as  small  as 
the  best  fixed  linear  predictor  for  that  sequence. 

I.  Linear  Prediction 

In  this  work,  we  consider  the  problems  of  adaptive  filtering 
and  linear  prediction  in  a  competitive  algorithm  framework. 
Given  a  data  sequence  xn  =  {®[t]}t*=i,  the  optimal  set  of 
p  coefficients,  Wk ,  k  =  1, ...  ,p,  that  minimizes  the  total 
prediction  error 

N  p 

EW[N]  =  ^  wkx[n  -  k})2 , 

n= 1  fc=l 

is  uniquely  determined  and  certainly  depends  on  the  input  se¬ 
quence.  Recently,  a  linear  prediction  algorithm  was  presented 
that  asymptotically  achieves  the  minimum  average  sequen¬ 
tially  accumulated  prediction  error  over  all  linear  predictors 
of  order  p,  i.e.  min™  Em  [Ar] .  for  every  individual  sequence  [1]. 
In  this  work,  we  somewhat  modify  the  algorithm,  and  as  a 
result  improve  both  the  algorithm  performance,  in  terms  of 
the  bound  on  the  redundancy,  and  provide  a  more  intuitive 
proof  of  this  bound. 

II.  p-th-Order  Linear  Prediction 

We  consider  the  problem  of  linear  prediction  with  a  filter  of 
fixed-order  p,  parameterized  by  the  vector  ui  =  [u>i , . . .  ,  wp\T , 
with  predicted  value  Xas{n]  =  wTx[n],  where  x[n]  =  [x[n  — 
1],...  ,x[n  —  p\]T ■  Let  x[n},  n  —  1 ,N,  be  a  bounded, 
but  otherwise  arbitrary,  sequence  such  that  |x[n]|  <  A , 

where  A  need  not  be  known  in  advance.  Let  ln(x,x^s)  be 
the  running  total  squared  prediction  error,  i.e.  ln(x,x^)  = 
2"=1(a:[t]  —  Xw[t})2-  Define  a  universal  predictor  xu[n],  as 
xu\n]  —  wu[n  -  l]Ti[n],  where,  wu[n]  —  [R^*1  +  5I]~l  rxx, 
Rx. >  =  x[k]x[k}T ,  rxx  =  Efc=i  x[k]x[k],  and  <5  >  0  is  a 

positive  constant. 

Theorem  1  The  total  squared  prediction  error  of  the  p-th- 
order  universal  predictor,  ln(x,xu)  =  S"=i(a:W  ~  ^u[t])2,  sat¬ 
isfies 

ln(x,xu).<  min  {ln(x,x^s)  +  <5||u;||2}  4-  A2  In  \l  +  R^-1 1 , 

uj 

—ln(x,xu)  <  min  —  {ln(x,Xvs)  +  <5||w||2}  +  — -  In  ( I+—7— 
n  wn  n  \  o  J 
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Theorem  1  states  that  the  average  squared  prediction  error 
of  the  p-th-order  universal  predictor  is  within  0(A2  p\\i(n)  j  n) 
of  the  best  batch  p-th-order  linear  prediction  algorithm,  for 
every  individual  sequence  xn .  The  idea  behind  the  univer¬ 
sal  predictor  and  the  proof  of  the  Theorem  is  as  follows.  We 
define  a  “probability”  assignment  of  each  of  the  continuum 
of  predictors  w  €  Rp  to  the  data  sequence  xn  such  that  the 
probability  will  be  an  exponentially  decreasing  function  of  the 
total  squared-error  for  that  predictor.  Over  the  continuum  of 
predictors  with  coefficients  w,  we  assign  a  Gaussian  prior  over 
these  probabilities,  and  define  the  universal  probability  to  be 
the  Bayesian  mixture  of  these  probabilities.  With  the  Gaus¬ 
sian  prior,  we  can  obtain  the  universal  probability  in  closed 
form.  Since  the  probabilities  assigned  by  every  predictor  can 
also  be  found  in  closed  form,  we  can  compare  the  universal 
probability  to  that  of  the  best  batch  predictor  for  each  se¬ 
quence. 

We  note  that  the  conditional  universal  probability  is  Gaus¬ 
sian  distributed  about  same  Bayesian  (time-varying)  mixture 
of  predictor  outputs  as  that  applied  to  the  individual  predic¬ 
tor  probabilities,  however  it  is  not  in  the  form  of  an  exponen¬ 
tially  decreasing  function  of  the  prediction  error  of  a  particu¬ 
lar  predictor.  In  [1],  the  conditional  mean  of  this  distribution 
was  used  as  a  predictor  and  was  shown  to  be  universal  us¬ 
ing  a  convexity  argument  to  bound  its  excess  prediction  error. 
However,  the  convexity  argument  required  construction  of  a 
new  Gaussian,  centered  about  the  same  mean,  which  was  both 
larger  than  the  universal  probability  over  the  range  of  the  data 
and  also  in  the  form  of  an  exponentially  decreasing  function 
of  the  accumulated  prediction  error.  This  led  to  a  redundancy 
proportional  to  0(4 A2p  ln(n) /n) ,  four  times  larger  than  that 
achieved  here.  In  this  work,  we  search  for  a  new  Gaussian 
in  the  proper  form,  with  a  different  mean  and  variance,  that 
is  larger  than  the  universal  probability  over  the  range  of  the 
data.  By  symmetry  arguments,  we  obtain  the  new  mean  and 
variance  that  minimize  the  resulting  redundancy  of  the  uni¬ 
versal  predictor.  The  resulting  predictor  xu  \n ]  can  be  viewed 
as  the  least-squares  batch  solution  over  the  past,  where  we  as¬ 
sume  that  x[n ]  —  0  and  update  rfx[0]  and  Rfx [0]  accordingly 
before  predicting  a;[n]. 
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Abstract  -  For  the  first  time  it  is  here  shown  that  Symbol- 
by-Symbol  Maximum  A  Posteriori  (SbS-MAP)  receivers 
are  able  to  generate  Non-Linear  Minimum  Mean  Square 
Error  (NL-MMSE)  estimates  of  the  transmitted  symbols. 

I.  Introduction 

SbS-MAP  receivers  have  the  appealing  feature  of  being 
able  of  generating  a  kind  of  soft  information  that  can  be 
considered  intermediate  between  hard-decisions  and  A 
Posteriori  Probabilities  (APPs):  the  NL-MMSE  estimates  of 
the  transmitted  symbols.  This  result  is  not  a  surprise  since 
MMSE  estimation  is  defined  through  an  “a  posteriori ” 
expectation  functional.  Nevertheless,  the  fact  that  one  can 
generate  NL-MMSE  estimates  through  an  SbS-MAP 
receiver  has  never  been  clearly  pointed  out  in  the  current 
literature. 

The  availability  of  NL-MMSE  estimates  of  the 
transmitted  symbols  is  very  useful  in  many  applications, 
especially  in  those  applications  where  it  is  necessary  to 
mitigate  the  effects  of  wrong  hard  decisions.  This  has  been 
recently  pointed  out  in  [1],  although  no  method  for 
computing  the  NL-MMSE  was  given. 

II.  A  General  Model  of  the  Observations 

The  general  model  of  a  signal  transmitted  over  a  noisy 
and  dispersive  time-invariant  channel  is  here  considered. 
The  random  data  sequence  {s(k)},  constituted  by  M- ary 
generally  complex  i.i.d.  equiprobable  symbols,  is  transmitted 
over  a  linear  channel  whose  time- invariant  equivalent  L-long 
discrete-time  impulse  response  is  denoted  by  {g(k)).  Thus, 
the  ISI-impaired  noisy  sequence  observed  at  the  output  of  a 
baud-rate  sampled  whitened  matched  receiving  filter  can  be 
modeled  by  the  usual  relationship: 

Z.-1 

y(f)  =  -k)  +  v(0  =  GTx(i)  +  v(i) ,  (1) 

*=0 

where  G  is  the  L-long  impulse  response  vector  of  the  ISI 
channel,  x(i)  =  [s(i)  ...s(i  -  L  + 1)]^  is  the  corresponding 
channel-state  vector  and  {v(z')}  is  a  complex  zero  mean 
Gaussian  noise  sequence.  The  L-variate  random  sequence 
{*(;)}  is  a  first-order  Markov  chain  known  as  “state 
sequence”  of  the  ISI  channel  and  may  assume  N=ML  distinct 
values  {^.}. 

III.  NL-MMSE  Estimation  and  APPs 

The  MMSE  estimate  of  the  symbol  s(i)  on  the  basis  of 
the  observations  from  step  1  to  step  i  is  given  by  the 
following  relationship: 


smmse(>)  =  £ j*(0 1  y\  }=  X sk  Pr(v(0  =  sk  I  y\ )>  (2) 

A=l 

It  is  possible  to  prove  that  (2)  can  be  re-written  in  the 
following  form: 

iNL-MMS £^,;  =  —^0 !  0  (3) 

where  sNL_MMSE{i;L)  is  the  vector  containing  the  NL- 

MMSE  estimates  of  the  last  L  transmitted  symbols,  tt{i\i)  is 
the  vector  of  the  APPs  of  the  state  sequence  of  the  ISI 

channel  and  E  is  a  LxN  matrix  whose  columns  are 
constituted  by  the  vectors  {£,}  of  (2).  The  relationship  in  (3) 
shows  that  the  NL-MMSE  estimates  of  the  last  L  transmitted 
symbols  can  be  expressed  as  a  function  of  the  APPs  of  the 
state  of  the  ISI  channel. 

IV.  Conclusions 

In  the  present  contribution,  we  presented  a  new  method 
for  generating  NL-MMSE  estimation  with  an  SbS-MAP 
receiver.  This  method  makes  the  use  of  SbS-MAP  receivers 
very  appealing  because  they  can  generate  three  kinds  of 
information:  a  hard-statistics  based  information  (the  hard- 
decisions),  a  soft-statistics  based  information  (the  APPs)  and 
an  intermediate  case  represented  by  the  NL-MMSE 
estimates  of  the  transmitted  symbols. 

In  general,  the  use  of  “estimates”  in  place  of  “decisions” 
is  useful  whenever  the  reliability  of  the  hard-decisions  is 
low.  In  fact,  a  wrong  hard-decision  is  certainly  more  harmful 
(to  channel  estimation  and  tracking  or  to  systems  with 
feedback)  than  an  imperfect  estimate  on  the  transmitted 
symbol  [2,  3].  Other  useful  applications  that  may  be  foreseen 
for  the  proposed  technique  are  in  the  field  of  multi-user 
detection. 
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Abstract  —  We  deal  with  the  estimation  of  the  struc¬ 
ture  of  the  covariance  matrix  of  the  noise  and  its  appli¬ 
cation  to  adaptive  radar  detection  of  coherent  pulse 
trains  in  compound-Gaussian  clutter.  Resorting  to 
secondary  data,  free  of  signal  components,  we  propose 
an  estimator  which,  plugged  into  the  NMF  in  place 
of  the  actual  covariance  matrix,  leads  to  an  adaptive 
detector  CFAR  with  respect  to  the  statistics  of  the 
noise. 


I.  Introduction 

The  design  of  detection  schemes  optimized  under  non- 
Gaussian,  clutter-dominated,  disturbance  is  motivated  by  the 
experimental  evidence  that  the  Gaussian  assumption  is  no 
longer  met  for  clutter  returns  as  viewed  by  high  resolution 
radars.  These  returns  are,  instead,  more  accurately  described 
in  terms  of  compound-Gaussian  processes  [1,  and  references 
therein]. 

It  is  of  primary  concern  to  come  up  with  canonical  receivers, 
namely  detectors  whose  structure  as  well  as  the  distribution  of 
the  decision  variable  (under  the  noise-only  hypothesis)  is  in¬ 
dependent  of  the  clutter  statistics.  In  [1]  it  is  shown  that  the 
Generalized  Likelihood  Ratio  Test  admits  a  sufficient  statistic, 
referred  to  in  the  following  as  NMF,  independent  of  the  clut¬ 
ter  amplitude  probability  density  function  if  the  number  of 
integrated  pulses,  N  say,  becomes  increasingly  large.  In  order 
to  come  up  with  a  completely-adaptive  detector  the  key  point 
is  to  substitute  into  the  NMF  the  covariance  matrix  M  of  the 
noise  with  a  suitable  estimate  of  the  structure  of  M,  £  say. 
We  propose  a  new  estimate  of  £,  based  upon  secondary  data, 
and  demonstrate  that  the  corresponding  adaptive  scheme  is 
CFAR  with  respect  to  the  clutter  statistics.  The  performance 
assessment  shows  that  its  loss  (with  respect  to  the  NMF)  is  al¬ 
ways  acceptable,  and  often  negligible,  in  scenarios  of  practical 
interest  for  radar  applications. 


II.  Problem  Formulation  and  System  Design 

The  problem  of  detecting  a  radar  signal  in  additive,  clutter- 
dominated,  disturbance  can  be  posed  in  terms  of  the  following 
binary  hypotheses  test: 


Ho 


H  i 


r*  =  ck,  k 


,K; 


r-ap  +  c,  rk  -  ck,  k  =  1, . . . ,  K\ 


where  r,  p,  c,  and  the  cfcs,  k  =  1  denote  the 

N— dimensional  complex  vectors  of  the  samples  from  the  base¬ 
band  equivalents  of  the  received  signal,  the  signature  of  the 
wanted  target  echo,  the  noise  (all  of  them  from  the  range  cell 
under  test),  and  of  the  secondary  data,  respectively,  while  a 
is  an  unknown,  possibly  complex,  parameter  accounting  for 
the  target  radar  cross  section.  Moreover,  c  and  the  c^s  can  be 


thought  of  as  zero-mean  Spherically  Invariant  Random  Vec¬ 
tors  or,  otherwise  stated,  they  can  be  written  in  the  form  [1] 

c  =  sg,  c*  =  skgk,  k  =  1, . . . ,  K, 


where  g  and  the  g*,s  are  complex,  zero-mean,  Gaussian  vec¬ 
tors,  s  and  the  sks  are  real,  non-negative,  random  variates, 
and  s  and  g,  similarly  sk  and  g*,  k  -  1, . . . ,  K,  are  each  other 
independent.  We  also  assume  that  {g,  gi ,  -  *  - ,  g/c }  is  a  set 
of  independent,  identically-distributed,  circularly-symmetric 
vectors  while  {s,  si ,  •  •  • ,  s«-}  is  a  set  of  samples  drawn  from  a 
non-negative,  possibly  correlated,  wide-sense  stationary  ran¬ 
dom  process  with  finite  mean  square  value  that,  without  loss 
of  generality,  we  suppose  in  the  sequel  to  be  unitary. 

We  cluster  the  K  secondary  data  into  groups  of  cells  sharing 
the  same  value  of  the  texture:  each  group  consists  of  Ks  cells, 
i.e., 

s k  —  S j-  ^k_  i  ,  k  —  1 , . . . ,  K , 

where  K  —  Ks  x  Kg,  with  Kg  denoting,  in  turn,  the  number 
of  groups,  and  fz]  is  the  minimum  integer  greater  than  or 
equal  to  x.  Finally,  we  assume  that  the  power  spectral  density 
of  the  baseband  equivalent  of  the  clutter  is  symmetric  about 
/  =  0:  it  implies  that  M  =  E\rkrk ]  =  2M(U)  =  2F[rj!.1)r£1)  T] 
with  H  denoting  transpose  conjugate,  T  transpose,  and  rjj.1' 
the  real  part  of  the  vector  rk,  k  =  1, . . . ,  K. 

Notation.  Let  Af  =  {!,...,  N},  V  C  Af  with  cardinality  P 
and  the  complement  of  V  with  respect  to  Af  be  denoted  as 
V.  For  any  N-dimensional  vector  x,  xp  is  obtained  from  x  by 
striking  out  the  ith  component  Vi  €  V.  For  any  JVxJV  matrix 
A,  A vv  is  obtained  from  A  by  striking  out  the  ith  row  and 
the  ith  column  Vi  g  V . 

We  propose  the  following  estimate  of  the  structure  £(n^  of 
M(11): 


Kc 

(U)  _  1  \  " 

Kn 


fc(J  =1 


S~'kcKs  (1)  (1)  T 

l^k=(kG-l)Ks+l  k  lk 


(y'kcKs  (2)  (2)T\  *  ’ 

\2^k=(kG-i)Ks  +  i  rk  rk  )vv 


(1) 


where  |  •  |  denotes  the  determinant  of  a  square  matrix  and  r^2) 
is  the  imaginary  part  of  the  vector  r* ,  k  —  1, . . . ,  K. 

It  can  be  shown  that  £(11)  is  well-defined  when  P  <  Ks- 
Assume  also  K  >  N .  Then,  it  can  be  shown  that  the  NMF 
with  £(11),  given  by  (1),  in  place  of  M  is  CFAR  with  respect 
to  M.  Obviously,  such  detector  is  also  CFAR  with  respect  to 
the  statistics  of  the  texture.  Finally,  not  only  it  tends  to  the 
NMF  as  K  diverges,  but  its  loss  is  acceptable  also  for  finite 
K,  thus  showing  its  effectiveness  in  real  environments. 
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Abstract  —  A,  two-step  sub-optimal  algorithm  for 
decoding  binary  product  codes  is  discussed.  This  al¬ 
gorithm  realizes  at  least  half  the  minimum  Euclidean 
distance  of  the  code.  The  fundamental  geometric 
properties  associated  with  the  algorithm  are  inves¬ 
tigated,  and  bounds  on  the  number  of  nearest  neigh¬ 
bors  are  derived.  This  investigation  also  results  with 
an  improved  algorithm  which  achieves  the  minimum 
effective  error  coefficient,  the  number  of  minimum- 
weight  codewords  in  the  product  code. 

I.  Introduction 

A  product  code  Cv  =  CT  x  Cc  contains  all  the  matrices  whose 
columns  are  codewords  in  the  code  Cc  and  the  rows  are  code¬ 
words  in  CT  ■  The  parameters  of  the  product  code  are  given  by 
[7ip,  A;p, dp]  =  [nTnc,  krkc,  drdc],  where  n  denotes  the  length, 
k,  the  dimension,  and  d,  the  minimum  Hamming  distance  of 
the  corresponding  code. 

Product  (iterated)  codes  were  introduced  by  Elias  in  1954 

[3],  and  studied  by  many  researchers  until  the  late  70’s.  Sev¬ 
eral  hard  decision  decoding  techniques  were  proposed  at  that 
time  for  decoding  a  product  code  up  to  its  guaranteed  error- 
correction  capability.  Reddy  and  Robinson  [6]  gave  a  gen¬ 
eral  decoder  for  any  product  code,  with  good  correction  ca¬ 
pabilities  for  simultaneous  burst  and  random  errors.  Yu  and 
Costello  [8]  proposed  a  generalized  minimum  distance  decoder 
for  Q-ary  Output  channels.  In  1993  product  codes  gained  re¬ 
newed  attention  with  the  soft  decision  decoder  of  Lodge  et  al. 
[5],  and  the  birth  of  turbo  (iterative)  decoding.  While  Lodge 
et  al.  [5]  used  the  a  posteriori  probability  as  the  reliability- 
measure  for  each  bit,  others,  e.g.  [7],  employed  suboptimal 
reliability  measures  that  are  less  computationally  involved. 

II.  Decoding 

The  proposed  decoding  technique  [2]  is  not  an  iterative  one, 
nor  does  it  require  explicit  reliability-measure  calculations  for 
each  bit.  Rather,  it  is  a  suboptimal  soft  decision  decoding 
scheme,  more  in  the  line  of  the  aforementioned  work  [6],  [8], 
operating  as  follows.  Each  of  the  component  codes  is  soft- 
decision  decoded  separately,  rows  (columns)  and  then  columns 
(rows),  while  passing  a  simple,  hard-limited,  reliability  mea¬ 
sure  from  the  rows  (columns)  to  the  columns  (rows).  The 
result  of  the  columns  (rows)  decoders  is  taken  as  the  output. 
Generally  speaking,  while  turbo  decoding  reduces  the  proba¬ 
bility  of  bit  error,  the  proposed  technique  is  aimed  at  reducing 
the  probability  of  codeword  error. 

III.  Analysis  and  conclusions 

We  prove  [2]  that  if  the  decoders  of  the  component  codes 
realize  half  the  minimum  Euclidean  distance  of  these  codes, 


then  the  complete  decoding  scheme  realizes  half  the  minimum 
Euclidean  distance  of  the  product  code.  Such  a  scheme  is 
known  as  bounded  distance  (BD)  decoding.  An  analysis  of 
the  decision  region  associated  with  this  decoding  scheme  is 
given,  revealing  the  following  phenomena:  i)  regardless  of  the 
specific  choice  of  a  BD  decoder  used  for  decoding  the  com¬ 
ponent  codes,  the  complete  decoding  scheme  is  always  better 
than  strictly  BD  decoding;  ii)  The  algorithm  contains  pseudo 
nearest  neighbors  [1].  Based  on  the  above  analysis,  an  upper 
bound  on  the  number  of  conventional  nearest  neighbors,  i.e. 
the  effective  error  coefficient,  is  derived.  Furthermore,  it  is 
shown  that  the  minimum  effective  error  coefficient  is  achiev¬ 
able,  as  in  the  case  of  optimal  decoding,  by  using  a  slightly 
modified  decoding  scheme. 

The  proposed  decoding  algorithms  may  be  attractive  for 
practical  implementation  due  to  their  low  decoding  complex¬ 
ities.  Decoding  involves  an  order  of  nT  +  nc  applications  of  a 
component  code  decoder.  For  comparison,  a  single  iteration  of 
a  block  turbo-decoding  scheme  requires  an  order  of  0(nTnc) 
such  applications.  Also,  due  to  their  geometrical  properties, 
the  algorithms  can  be  employed  as  stopping-criteria  (within 
the  framework  of  block  turbo-decoding)  for  terminating  the 
iterative  process.  Since  these  algorithms  aim  at  reducing  the 
probability  of  word  error,  they  axe  good  candidates  for  the 
decoding  of  coset  product  codes  [4]. 
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Abstract  —  Product  codes  have  been  an  effective 
coding  method  for  communication  channels  where 
both  random  and  burst  error  occur.  In  this  paper, 
we  present  a  new  approach  to  the  structure  and  Max¬ 
imum  Likelihood  (ML)  decoding  of  product  codes  us¬ 
ing  Tanner  graphs.  For  product  codes  having  a  sub¬ 
code  which  is  a  product  of  simple  parity  codes  and 
repetition  codes,  we  show  how  to  obtain  a  sub-code 
with  an  acyclic  Tanner  graph  and  the  largest  possi¬ 
ble  distance.  We  show  that  in  all  cases  of  interest, 
a  n-dimensional  product  code  has  such  a  structure. 
Wagner  rule  decoding  is  used  on  this  sub-code  and  its 
cosets  to  obtain  an  effective  and  efficient  maximum- 
likelihood  decoding  of  the  given  product  code. 

I.  Introduction 

The  product  codes  first  proposed  by  Elias  in  the  1950’s 
are  multi-dimensional  codes  constructed  by  combining  sim¬ 
pler  component  codes.  Experience  has  shown  that  product 
codes  generally  have  good  random-error-correction  and  burst- 
error-correction  capabilities.  In  [1],  Tanner  extended  ear¬ 
lier  works  by  Gallager  on  low-density  parity-check  codes  to 
product  codes  using  bipartite  graphs,  since  known  as  Tanner 
graphs.  It  is  well  known  that  using  this  approach  one  can  con¬ 
struct  convergent  decoding  algorithms  for  codes  with  acyclic 
graphs.  The  question  of  which  codes  have  acyclic  Tanner 
graphs  was  answered  categorically  in  [2].  In  [3],  it  was  shown 
that  decomposition  of  a  code  into  an  acyclic  sub-code  and 
its  cosets  can  provide  an  efficient  method  for  the  maximum- 
likelihood  decoding  of  some  of  the  best  known  linear  block 
codes.  In  the  present  work,  we  concentrate  on  product  codes 
for  which  the  row  and  column  codes  are  based  on  well  known 
linear  block  codes  such  as  Golay  code  find  Reed-Muller  codes. 
This  assumption  is  justified  by  the  fact  that  the  minimum 
distance  and  dimension  of  a  given  product  code  is  directly  re¬ 
lated  to  the  distance  and  dimension  of  its  component  codes. 
For  this  reason,  we  are  interested  in  product  codes  using  good 
binary  block  codes  as  components.  Extending  the  work  in  [3], 
we  will  provide  a  systematic  way  of  obtaining  an  optimal  sub¬ 
code  with  an  acyclic,  uniform  Tanner  graph  with  the  largest 
possible  distance  such  that  the  number  of  the  corresponding 
cosets  Eire  minimized  and  decoding  complexity  is  lowered. 

II.  MAIN 

It  is  well  known  that  the  generator  matrix  for  product 
of  codes  A  and  B  is  given  by  the  Kronecker  product  of 
their  generators,  that  is  Ga  ®  Gr.  It  is  also  known  that  if 
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cations  and  Information  Technology  Ontario  (CITO). 


A.  K.  Khandani 
Dep.  of  Elect.  &;  Comp.  Eng. 

University  of  Waterloo,  Waterloo 
Ontario,  Canada,  N2L  3G1 
khandaniCshannon .uwaterlo . ca 

two  matrices  G  and  G'  differ  by  a  permutation  of  row  and 
columns,  then  their  corresponding  Tanner  graphs  are  isomor¬ 
phic.  Allowing  for  row  and  column  permutations,  we  will  show 
that  if  C'  and  M'  be  sub-codes  of  codes  C  and  M  respec¬ 
tively,  and  the  decomposition  of  corresponding  generators  be 
Gc  —  GC'  +  Gc/c'  and  Gm  —  Gm'  +  GM/Mi,  then  the  prod¬ 
uct  of  C  and  M  is  equal  to  the  union  of  the  sub-code  C'  ®  M' 
and  its  cosets  which  can  be  easily  calculated  from  appropriate 
products  of  C',  M',  G/C',  and  M/M'. 

Consider  an  n-dimensional  product  of  good  codes.  It  was 
shown  in  [3]  that  each  of  these  codes  has  an  acyclic  sub-code 
with  a  generator  of  the  form  Hm  ®  £„  where  Hm,  £n  are 
matrix  generators  of  some  repetition  codes  and  simple-parity 
check  codes  of  length  m  and  n,  respectively.  Hence,  an  n- 
dimensional  product  code  will  have  a  sub-code  of  the  form, 

("Tiji  ®  £i1 )  ®  (^-jj  ®  £i 2 ®  •  •  •  (7lj „  ®  £i „ ). 

Regrouping,  and  using  the  facts  the  Kronecker  product  is  as¬ 
sociative,  and  that  the  Kronecker  product  of  repetition  codes 
is  simply  another  repetition  code,  this  can  be  rewritten  as 

TZt  &(((£;,  ®  £ia)  ®  £is)  •  •  •£,■„),  for  some  ft*,. 

We  will  show  using  the  results  of  [2]  that  the  product  code 
given  by  the  above  equation  always  has  cycles  if  it  includes 
more  them  one  parity  check  code.  The  aim  is  to  show  how  to 
obtain  an  acyclic  sub-code  of  the  form  71  ®  £  for  these  cases. 
We  will  first  show  how  to  find  an  optimal  acyclic  sub-code  for 
the  case  of  ®  £{2 .  We  use  this  result  to  find  an  optimal 
acyclic  sub-code  for  ®  ®  £;t  in  a  recursive  manner,  since 

this  product  can  be  considered  as  (£;,  ®£i2)®£ii  and  it  has  a 
sub-code  of  the  form  7t  ®  (£  ®  £,3 ).  Using  this  approach  n  —  1 
times,  it  follows  that  (£;,  ®f,2  ® •  •  •  £,■„),  and  consequently,  the 
n-dimensional  product  code  will  have  an  acyclic  sub-code  of 
the  form  71  ®  £  of  appropriate  sizes.  Finally,  following  earlier 
work  in  [3],  the  simple  structure  of  71  ®  £  allows  it  to  be  easily 
decoded  using  the  Wagner  rule  in  conjunction  with  the  trellis 
representation  of  the  corresponding  cosets. 
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Abstract  —  Many  block  codes  can  be  represented  as 
an  intersection  of  two  or  more  easily  decoded  codes. 
We  present  a  new  decoding  algorithm  for  decoding 
product  codes  that  utilizes  this  property.  It  will  be 
shown  that  this  algorithm  is  maximum  likelihood  de¬ 
coding.  The  complexity  of  the  algorithm  depends  on 
the  decoding  complexity  of  the  constituent  codes  and 
the  quality  of  the  channel. 

I.  Summary 

Let  x  and  y  be,  respectively,  the  codeword  representing  the 
message  and  the  output  from  the  channel.  Also,  let  c  be 
the  codeword  in  C  nearest  to  y,  i.e. ,  the  maximum  likelihood 
estimation.  Let  Cj  and  C'2  be,  respectively,  a  (ni,fci,di)  and 
a  {n2,k2,d2)  codes  over  F2 .  Let  the  code  C\  with  parameters 

(n2ni,n2ki,di)  be  defined  as  C\  =  {u\u  C  1F£2  xni  ,ult.  € 
C[,i  e  {1,  ...,n2}},  where  u,,.  is  the  *':th  row  in  the  n2  x  m 
matrix  u.  In  other  words,  C\  is  the  n2  fold  direct  sum ,  see 
[1,  page  76]  over  C\.  In  a  similar  way,  we  can  obtain  the 
code  C2  with  parameters  (n2ni  ,k2ni,d2)  by  the  ni  fold  direct 
sum  over  the  code  C2,  column- wise.  Let  C  be  the  product 
code  obtained  from  C[  and  C2,  see  [1,  page  568],  Clearly  the 
following  is  valid  C  —  Cif]C2.  For  each  word  u  in  F£2Xni, 
there  might  be  more  than  one  word  at  the  same  Hamming 
distance  from  u.  Therefore,  we  use  a  metric  function  £>(-,  •) 
that  solves  such  ties.  In  the  case  of  soft  decoding  with  high 
precision,  the  squared  Euclidean  distance  can  be  used  since 
the  probability  of  ties  would  approach  zero.  Let  S  be  a  list  of 
all  the  codewords  in  C\  with  Hamming  distances  from  y  less 
than  or  equal  to  the  covering  radius  of  C  listed  in  an  ascending 
order  using  the  distance  D(-,  •)  mentioned  above.  It  is  easy  to 
see  that  c  will  be  a  member  of  this  list  since  c  is  an  element  in 
Ci  too.  The  list  T  can  also  be  generated  in.  a  similar  manner 
by  list  decoding  on  C2 .  The  decoding  commences  by  checking 
the  words  in  S  one  by  one  beginning  from  the  first  word  and 
downward  to  see  if  it  is  also  a  codeword  in  C2  ■  The  algorithm 
stops  when  c  is  reached  which  is  the  first  word  that  passes 
the  check.  An  alternative  method  would  be  to  jump  between 
the  two  lists  S  and  T  checking  the  elements  of  these  lists  at 
increasing  distance  until  c  is  reached  in  either  one  of  the  two 
lists.  It  is  clear  from  the  discussion  above  that  the  algorithm 
is  maximum  likelihood.  The  bottle-neck  part  of  the  algorithm 
is  the  list  decoding  of  Ci  and  C2.  In  the  case  of  product  codes, 
however,  this  problem  is  reduced  to  list  decoding  the  rows  or 
the  columns,  assuming  that  there  exists  an  algorithm  for  list 
decoding  of  C[  and  C2 .  A  list  decoder  for  C\  can  be  made  by  the 
direct  sum  of  members  of  the  list  decoding  of  Cj.  In  a  similar 
manner  we  can  obtain  the  list  decoding  for  C2  by  list  decoding 


the  columns.  Instead  of  generating  the  elements  of  the  set  <5, 
the  elements  of  the  list  decoder  for  each  row  are  stored  with 
their  respective  distances  from  their  corresponding  rows  in  y 
At  iteration  l  the  decoder  searches  through  the  rows  beginning 
from  the  first  row  using  l  elements  only  from  each  row  to 
generate  the  /  nearest  different  combinations  of  elements  and 
discarding  the  rest  until  reaching  the  last  row  in  the  received 
matrix  and  the  Lth  member  of  the  list  S  is  thus  generated  to 
be  checked  to  see  if  it  was  the  required  solution. 

An  important  note  is  that  a  limited  number  of  list  ele¬ 
ments  in  each  row  can  generate  a  very  large  number  of  the 
elements  in  the  Hst  S.  This  is  the  main  argument  for  lower 
complexity  decoding  for  such  codes.  If  a  large  product  code  is 
implemented  on  a  memoryless  channel  with  transition  proba¬ 
bility  slightly  greater  than  di/2ni,  there  will  be,  in  average, 
a  list  of  one  solution  for  decoding  each  row  that  contains  the 
correct  solution,  resulting  in  a  very  small  list  S  that  has  to 
be  checked.  A  GMD  decoder,  see  [2],  can  not  decode  such 
error  patterns  correctly.  When  the  transition  probability  ex¬ 
ceeds  d\ /n,  however,  the  size  of  the  list  increases  exponentially 
which  makes  the  algorithm  impractical.  On  the  other  hand, 
the  minimal  trellis  complexity,  taken  as  the  maximum  number 
of  states  in  the  trellis,  of  the  same  product  code  is  of  the  or¬ 
der  of  0( 2mi"I fcl  — frl •*2’™2  —  ^ ) ,  see  [3,  page  76].  In  order  to 
evaluate  the  performance  of  the  algorithm,  many  simulations 
were  performed  for  different  product  codes  with  different  rates. 
In  all  simulations  a  suboptimal  algorithm  that  utilizes  a  Chase 
3  decoder,  see  [4,  page  76],  was  used  to  generate  a  list  of  at 
most  two  solutions  to  be  used  in  the  decoding  for  the  rows  or 
columns.  In  each  iteration  the  result  of  the  previous  iteration 
is  list  decoded  instead  of  decoding  the  original  message.  This 
is  done  in  order  to  keep  the  complexity  of  the  decoder  to  mini¬ 
mum  comparable  to  bounded  minimum  distance  decoding.  In 
all  simulations  the  new  algorithm  had  a  better  decoding  gain 
than  the  GMD  decoder  by  circa  2  dB.  The  performance  can 
be  further  increased  with  increasing  complexity. 

References 

[1]  MacWilliams  and  Sloane.  The  Theory  of  Error  Correcting 
Codes,  North  Holland,  New  York,  1977. 

[2]  G.D.  Forney  Jr,  Generalized  minimum  distance  decoding.  IEEE 
trans  IT,  vol.  12,  pp.  125-131,  April  1966. 

[3]  A.  Vardy,  Trellis  Structure  of  Codes.  Handbook  of  Coding  The¬ 
ory,  North  Holland,  pp.  1989-2118,  1998. 

[4]  D.  Chase,  A  class  of  algorithms  for  decoding  block  codes  with 
channel  measurement  information.  IEEE  trans  IT,  vol.  18,  no. 
1,  pp. 170-182,  January  1972. 


0-7803-5857-0/00/51  0.00  ©2000  IEEE. 


87 


\WWWW 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Randomly  Interleaved  SPC  Product  Codes 


David  Rankin 

Dept,  of  Electrical  and  Electronic 
Engineering 

University  of  Canterbury 
Private  Bag  4800 
Christchurch 
New  Zealand 

dmr43(Delec .  canterbury .  ac .  nz 

Abstract  —  This  paper  considers  single  parity  check 
(SPC)  product  codes  which  are  randomly  interleaved 
between  the  encoding  of  each  dimension.  Using  ran¬ 
dom  interleaving  reduces  the  number  of  low  weight 
codewords  and  so  improves  performance. 

I.  Encoding  and  Decoding 

The  encoding  process  is  very  simple,  after  every  parity 
check  equation  is  encoded  in  a  single  dimension  the  data  (and 
possibly  the  parity  bits)  are  interleaved  before  the  next  di¬ 
mension  is  encoded.  The  component  codes  are  equal  length 
single  parity  check  (SPC)  codes  and  hence  the  code  rate  is 
R  =  K/N  where  N  =  nd,  K  =  (n  —  l)d,  d  is  the  number  of  di¬ 
mensions,  and  n  is  the  length  of  the  component  codes.  Unlike 
the  decoding  of  a  traditional  SPC  product  code,  a  randomly 
interleaved  (RI)  SPC  product  code  must  be  decoded  in  the 
reverse  order  of  the  encoding  process.  Natur  ally  this  code  is 
very  similar  to  a  serially  concatenated  code  with  the  appropri¬ 
ate  interleaver  size.  The  component  decoders  are  maximum 
a  priori  (MAP)  decoders  in  the  log  likelihood  domain,  hence 
the  bit  error  probability  in  the  component  code  is  minimised. 
Furthermore  the  extrinsic  information  and  received  channel 
values  are  interleaved/de- interleaved  as  they  are  passed  be¬ 
tween  the  decoders  in  each  dimension. 

II.  Low  Weight  Codewords 

In  [1]  RI  SPC  product  codes  have  been  analysed  in  terms  of 
partial  weight  distributions  corresponding  to  the  input-output 
weight  distributions  after  the  encoding  of  a  single  dimension. 
Therefore  the  expected  weight  distribution  of  the  overall  code 
can  be  calculated  over  the  ensemble  of  random  interleavers 
by  considering  each  dimension  to  be  independently  encoded 
with  the  input  weight  equal  to  the  output  weight  of  the  pre¬ 
vious  dimension  (assuming  both  the  data  and  parity  checks 
are  interleaved).  The  partial  input-output  weight  enumerator 
function  (IOWEF)  for  the  code  has  been  calculated  by  consid¬ 
ering  invariant  (under  permutation)  input  patterns  of  a  given 
input  weight.  The  expected  weight  distribution  for  three  di¬ 
mensional  RI  SPC  product  codes  (interleaving  both  the  data 
and  parity  bits)  with  n  —  8  is  given  by 

Bo  =  1,  B2=  0.3,  B4  =  21.9,  B6  =  160.4,  Bs  =  2668.5 

compared  to  B0  =  1  and  B8  =  21952  for  a  traditional  SPC 
product  code.  This  shows  a  trade-off  between  the  reduced 
number  of  low  weight  codewords  and  the  reduction  in  mini¬ 
mum  distance.  However  as  the  number  of  dimensions  increases 
the  reduction  in  the  number  of  low  weight  codewords  more 
than  offsets  the  possible  reduction  in  the  minimum  distance 
of  the  code. 
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Figure  1:  Performance  of  RI  SPC  Product  Codes 


III.  Results 

Simulation  results  for  two-  to  five-dimensional  RI  SPC 
product  codes  with  n  =  8  and  n  =  10  are  given  in  Fig.  1. 
The  results  are  shown  as  code  rate  versus  Eb/N0  for  a  prob¬ 
ability  of  bit  error  equal  to  10" 5 .  The  performance  of  these 
very  simple  codes  is  quite  exceptional,  especially  as  the  size  of 
the  component  code  increases  and/or  the  number  of  dimen¬ 
sions  increases.  Note  the  four-dimensional  (20, 19)  SPC  code 
with  rate  .8145.  Capacity  of  the  binary  input  AWGN  channel 
for  this  rate  occurs  at  Eb/No  =  2.15dB,  therefore  this  code  is 
only  0.63dB  away  from  capacity  at  Pb  =  10~5.  A  disadvantage 
is  the  exponential  increase  in  the  blocklength  as  the  number 
of  dimensions  (and  the  size  of  the  component  code)  increases. 
It  should  be  noted  that  the  “error  floor”  predicted  by  the 
analysis  typically  becomes  evident  at  Pb  <  10~5  for  four-  and 
five-dimensional  codes.  No  attempt  was  made  to  optimize  the 
interleavers,  only  randomly  generated  interleavers  were  used. 
Better  interleaver  design  should  improve  the  error  floor. 

IV.  Summary 

Randomly  interleaved  SPC  product  codes  are  extremely 
simple  to  encode  and  decode  and  yet  perform  surprisingly  well 
due  to  the  decrease  in  the  number  of  low  weight  codewords. 
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Abstract  —  We  present  a  new  soft  decision  majority 
decoding  algorithm  for  Reed-Muller  codes  RM(r,m). 
First,  the  reliabilities  of  all  received  symbols  are  recal¬ 
culated  into  the  reliabilities  of  the  parity  checks  that 
represent  each  information  bit.  In  turn,  information 
bits  are  obtained  by  the  weighted  majority  that  gives 
more  weight  to  the  more  reliable  parity  checks.  It 
is  proven  that  for  long  low-rate  codes  RM(r,m),  our 
soft  decision  algorithm  outperforms  its  conventional 
hard  decision  counterpart  by  101og10(7r/2)  «  2  dB  at 
any  given  output  bit  error  rate  e  <  1/2. 

I.  Introduction 

Consider  general  Reed-Muller  codes  RM(r,m)  [3]  of  length 
n  —  2™,  dimension  k  -  £<=o  (D  ,  and  code  distance  d  = 
2m~r.  The  majority  algorithm  [1]  provides  bounded  distance 
decoding  with  complexity  order  of  nk  or  less.  Also,  this  decod¬ 
ing  corrects  many  error  patterns  beyond  the  weight  d/2  [2], 
We  consider  majority  decoding  (see  also  [4])  for  RM  codes 
used  over  the  channels  with  white  Gaussian  noise  Af{0,  cr2). 
The  two  symbols  0  and  1  are  transmitted  els  +1  and  —1.  These 
two  take  arbitrary  real  values  u  at  the  receiver  end  with  prob¬ 
ability  densities  g(u  —  1)  and  g(u+  1),  where 

g(u)  =  e~u  ^2l 7  /yfa*. 

We  wish  to  process  further  the  likelihoods  p(0|u)  and  p(lju) 
while  keeping  the  complexity  0(nk)  of  majority  schemes. 
More  specifically,  the  following  questions  arise: 

•  Can  these  likelihoods  improve  the  performance  of  majority 
decoding  ? 

•  How  much  can  we  reduce  the  possible  S/N  ratios? 

•  How  many  “hard  decision”  errors  can  we  correct? 

II.  Decoding  algorithm 

The  idea  of  our  algorithm  is  as  follows.  Each  information 
symbol  of  order  r  can  be  found  from  2m~r  independent  parity 
checks  defined  over  disjoint  subsets  of  2r  code  symbols.  The 
simple  majority  of  these  checks  is  taken  in  hard-decision  de¬ 
coding.  By  contrast,  in  soft-decision  decoding  we  use  weighted 
majority.  First,  we  recalculate  the  initial  reliabilities  of  2” 
transmitted  symbols  into  the  reliability  of  the  corresponding 
parity  check.  Second,  the  majority  voting  scheme  accumu¬ 
lates  all  2m-r.  parity  checks  and  gives  more  weight  to  the  more 
reliable  ones. 

To  estimate  performance  of  a  given  code  RM(r,m),  we  fix 
an  output  bit  error  rate  e  <  1/2.  Then  we  introduce  the  e- 
sustainable  noise  powers  £T2(e)  and  a2(e).  These  are  the  max¬ 
imum  noise  powers  that  support  BER  e  in  hard-  and  soft 
decision  decoding,  respectively.  Similarly,  we  use  the  corre¬ 
sponding  e-sustainable  transition  error  probabilities  ph  and 
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p3.  Our  main  theoretical  result  is  that  soft-decision  decod¬ 
ing  gains  101oglo(7r/2)  «  2.0  dB  over  conventional  majority 
scheme  for  all  long  low- rate  RM  codes  at  any  output  error 
rate  e.  We  also  keep  the  former  complexity  order  of  0(nk). 
The  results  are  summarized  below. 

Theorem  1  For  any  output  bit  error  probability  e,  soft  deci¬ 
sion  decoding  of  long  codes  RM{r,  m)  of  fixed  order  r  increases 
?r/2  times  e-sustainable  noise  power  over  hard  decision  decod¬ 
ing: 

all  Oh.  n/2,  m  — >  oo. 

For  any  output  bit  error  probability  e,  soft  decision  decoding 
of  long  codes  RM(r,m)  of  fixed  code  rate  R  G  (0, 1)  increases 
4/ir  times  e-sustainable  transition  error  probability  over  hard 
decision  decoding: 

Ps/Ph  -*■  4/7T,  m—>oo. 

We  also  find  the  Euclidean  weights  of  the  error  patterns  cor¬ 
rectable  by  our  algorithm.  The  statement  below  shows  that 
this  algorithm  exceeds  about  2r/2  times  the  capacity  \fd  of 
bounded  distance  decoding. 

Theorem  2  For  m  — *  oo,  soft  decision  majority  decoding  of 
codes  RM(r,m)  corrects  virtually  all  error  patterns  of  Eu¬ 
clidean  weight: 

P  <  V™  ( d/2m)1^r+l ,  if  r  =  const, 

p  <  yjn/(m  In  2)  ,  if  0  <  R  <  1. 

From  the  practical  standpoint,  we  obtain  tight  numerical 
bounds  on  the  output  bit  error  rate  for  any  code  RM{r,m). 
When  these  bounds  were  compared  with  simulation  results, 
both  turned  out  to  be  almost  identical. 
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Abstract  —  The  A*  algorithm  is  applied  to  soft- 
decision  maximum-likelihood  decoding  (MLD)  of  lin¬ 
ear  block  codes  when  intersymbol  interference  (ISI) 
is  present.  Results  for  a  small  set  of  channels  and 
codes  show  that  the  chosen  column  permutation  of 
the  generator  matrix  for  the  code  affects  not  only  the 
decoding  complexity,  but  also  the  error  performance. 

I.  Introduction 

Consider  a  system  where  codewords  from  a  block  code  are 
transmitted  using  linear  modulation  on  a  band-limited  chan¬ 
nel,  such  that  ISI  is  present  at  the  channel  output,  where  white 
Gaussian  noise  is  added.  The  decoding  approach  taken  here  is 
to  treat  the  encoder  and  the  discrete-time  whitened  matched 
filter  (WMF)  receiver  [1]  as  a  joint  entity.  Using  finite-state 
machine  (FSM)  descriptions  of  the  encoder  and  the  channel, 
the  state  vector  for  a  joint  FSM  describing  the  whole  system  is 
achieved  by  concatenating  the  state  vectors  of  the  component 
state  machines  [2].  MLD  can  then  be  stated  as  determining 
the  most  likely  sequence  of  state  transitions  of  the  joint  FSM, 
i.e.,  the  optimal  path  through  the  joint  trellis,  given  the  re¬ 
ceived  signal.  A*  is  a  heuristic  graph  algorithm  [3]  that  can 
be  used  to  perform  that  search. 

II.  Maximum-Likelihood  Decoding 

Let  aj  —  (ajN,...  ,ay+ i)jv_i)  be  a  codeword  Cj  from  a 
binary  linear  block  code  C  and  let  cx}  =  (ajN,  ■  ■  ■  ,  Q(j+i)tv-i  ) 
be  the  sequence  of  channel  symbols  corresponding  to  aj.  The 
outputs  of  the  WMF  are  then  sn  =  YlkZo  /fcan-*  [1].  For 
simplicity,  assume  that  aj  —  0  for  j  /  0. 

Then  «o  =  (so,...  ,Sn+l- 2)  are  the  only  filtered  symbols 
affected  by  ao,  and  therefore  a 0,  Qo  and  «o  are  mapped  one- 
to-one.  At  the  WMF  output,  zero-mean  white  Gaussian  noise 
samples  with  variance  <7*  =  N0  are  added,  yielding  the 
received  sequence  z0  =  ( z0 ,  - . .  ,zn+l- 2)  =  «o  +  T70. 

The  task  of  the  joint  ML  decoder  is  to  determine  the  code¬ 
word  a0  that  maximizes  the  likelihood  function  pzo (z0|a0). 
Due  to  WMF  properties,  this  is  equivalent  to  determining  the 
codeword  that  minimizes  the  cost,  the  squared  Euclidean  dis¬ 
tance  between  z0  and  s0,  i.e.  a0  —  argmino/ec  || z0  -  s0||2- 

A  coarse  approximation  of  the  word  error  probability  for 
high  SNRs  is  given  by  P.  »  Q(^in§§t)  [1],  where  d2mm 
is  the  minimum  distance  between  any  two  codewords  at  the 
filter  output,  ignoring  the  multiplicity  of  the  error  event. 

The  state  of  the  ISI  FSM  is  defined  by  the  ( L  —  1)  most  re¬ 
cently  transmitted  symbols  and  the  state  of  the  encoder  FSM 
at  time  i  is  defined  by  the  p,  <  <  min  (AT,  N  -  K)  ac¬ 

tive  information  bits.  Concatenating  these  state  definitions,  a 
(pmax  +  L  —  l)-bit  joint  state  vector,  <Ti,  is  yielded. 

A*  needs  an  evaluation  function,  /(tr*)  =  g(tTi)  +  h(cri) 
to  be  defined  for  each  trellis  state,  cr*,  associating  with  it  an 
underestimate  of  the  total  cost  of  any  path  passing  through  it. 
g(tTi)  is  defined  as  the  actual  cost  of  the  path  taken  from  the 


initial  node,  cr0,  to  reach  <t*.  The  definition  of  h(cri)  proposed 
here  is  the  minimum  cost  of  any  length-(N  -f  L  -  1  -  i)  path 
from  <t i ,  not  necessarily  consistent  with  the  code.  This  cost  is 
easily  determined  with  the  Viterbi  algorithm  (VA). 

III.  Results  and  Conclusions 

For  simulations  and  d^,in-calculations,  real- valued  (L  =  2)- 
channels  characterized  by  fi/fo  have  been  considered.  The 
codes  studied  are  different-complexity  permutations  of  the  ex¬ 
tended  Golay  (24, 12)  code  and  various  BCH  and  RM  codes. 

For  the  selected  codes  and  channels,  the  lower  complexity 
permutations  have  worse  error  performance  than  their  higher- 
complexity  counterparts  in  terms  of  d2min  (see  Fig.  1)  though 
the  decoding  complexity  in  terms  of  expanded  nodes  and  ex¬ 
amined  edges  is  lower.  Simulation  results  support  this  for 
all  considered  different  permutations  of  the  same  code.  For 
the  P„-simulation  of  the  Golay  codes  on  the  fo  —  f\  chan¬ 
nel,  the  gain  of  the  higher  complexity  permutation  over  the 
lower  complexity  permutation  is  more  than  3  dB  £ib/No  at 
P.a7x  10“4. 

For  all  considered  codes,  the  number  of  expanded  nodes 
and  considered  edges  approach  constant  values  for  very  high 
and  low  SNRs.  For  high  SNRs,  the  distribution  of  the  num¬ 
ber  of  expanded  nodes  becomes  narrow,  and  an  average  of 
(N  +  L  —  1)  nodes  are  expanded.  Detecting  and  ignoring  re¬ 
peatedly  visited  nodes  yields  only  a  negligible  improvement 
on  the  decoding  complexity. 


Fig-  1=  «&.  as  function  of  fi/fa 
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Abstract  —  We  propose  a  new  soft  decoding  algo¬ 
rithm  for  long  general  binary  linear  codes,  based  on 
information  set  decoding.  Its  specificity  is  that  it  is 
derived  from  the  fastest  hard  decoding  algorithm  for 
long  codes,  which  explores  successively  information 
sets  close  to  each  other,  and  that  the  search  is  guided 
by  the  reliability  values  thanks  to  a  technique  inspired 
by  stochastic  resonance.  It  can  reach  for  instance  a 
bit  error  rate  of  10~6  at  3  dB  in  a  reasonable  time. 

I.  Introduction 

General  (random)  binary  linear  codes  provide  correcting 
capacity  depending  on  their  length  and  rate.  For  a  given  chan¬ 
nel  and  at  a  given  rate  (lower  than  the  channel  capacity),  the 
decoding  error  probability  decreases  exponentially  with  the 
length  of  the  code  (cf.  [1]).  Unfortunately,  the  computation 
cost  of  complete  decoding  is  also  exponential  in  this  length. 

The  decoding  algorithms  generally  explore  an  exponential 
set  (the  set  of  codewords,  of  error  patterns,  of  information  sets 
of  the  code...),  and,  after  an  adjustable  computational  effort, 
return  the  best  element  they  have  found.  When  the  compu¬ 
tational  effort  tends  to  infinity,  the  decoding  error  probability 
tends  to  that  of  complete  decoding.  Beyond  a  given  compu¬ 
tational  effort,  they  hence  perform  quasi-complete  decoding. 

In  this  context,  soft  decoding  has  two  different  advantages 
over  hard  decoding.  The  first  one  is  provided  by  the  greater 
accuracy  of  the  distance  between  a  received  word  and  the  pos¬ 
sible  codewords,  which  allows,  for  the  same  signal  to  noise  ra¬ 
tio  (SNR),  to  put  up  better  performances  in  terms  of  residual 
error  rate,  or  to  decrease  (by  approximately  2  dB)  the  required 
SNR  to  achieve  a  given  residual  error  rate. 

The  second  one  comes  from  the  fact  that  the  reliability  in¬ 
formations  may  offer  a  guideline  to  the  algorithms  in  their 
exploration  of  the  set,  and  may  consequently  reduce  impor¬ 
tantly  the  search  space  required  to  achieve  a  residual  error 
rate  close  to  that  of  complete  decoding. 

This  work  is  an  adaptation  to  the  soft  decoding  of  what 
is  supposed  to  be  the  fastest  general  quasi-complete  hard  de¬ 
coding  algorithm  for  long  codes,  in  a  way  that  intends  to  turn 
these  two  advantages  to  the  best  possible  account. 

II.  How  to  guide  Information  Set  Decoding 

The  above-mentioned  hard  decoding  algorithm  is  a  particu¬ 
lar  information  set  decoding  algorithm  designed  by  Canteaut, 
Chabanne  and  Chabaud  in  94,  in  order  to  improve  the  at¬ 
tacks  on  cryptosystems  based  on  error-correcting  codes,  like 
McEliece’s  cryptosystem  (cf.  [2]).  In  short,  it  searches  for 
an  information  set  with  as  few  errors  as  possible,  and  when 

1This  work  was  supported  by  a  grant  from  the  D.G.A.. 


changing  information  set,  an  only  information  position  is  re¬ 
jected  out  of  it  whereas  a  new  one  is  admitted  in  it. 

Since  we  want  the  information  set  to  contain  few  errors, 
we  can  start  the  search  (like  in  [3],  [4])  with  the  most  reliable 
basis,  the  set  of  the  k  most  reliable  independant  positions. 

To  continue  the  search,  a  choice  has  to  be  made:  which 
information  position  to  reject  and  which  new  one  to  admit  in 
the  information  set  ?  Determinist  methods  based  on  the  reli¬ 
ability  lead  to  a  periodic  exploration  of  the  same  information 
sets.  Preventing  this  implies  an  additive  cost  in  space  and 
time  that  can  be  dramatically  important  for  long  codes. 

Eventually,  the  chosen  method  is  based  on  a  controlled  ran¬ 
dom:  All  the  positions  in  (out  of)  the  information  set  can  be 
chosen  to  be  rejected  (admitted)  with  a  probability  that  is  a 
decreasing  (increasing)  function  of  their  reliability. 

By  looking  for  the  optimal  probability  distributions,  we  ob¬ 
serve  stochastic  resonance  phenomena:  For  a  given  set  of  re¬ 
liability  measures,  certain  probability  distributions  will  make 
our  algorithm  much  faster. 

The  concept  of  stochastic  resonance  has  been  introduced 
in  1981  (cf.  [5])  in  the  study  of  the  periodic  variations  of 
glacier.  It  is  mentioned  when  a  processing  can  turn  a  noise  to 
advantage,  and  when  the  moments  of  this  noise  are  adjusted 
to  optimize  this  advantage.  Such  phenomenas  have  been  ob¬ 
served  in  various  areas  (cf  [6]),  a  main  application  being  the 
improvement  of  lasers. 

III.  Results 

The  implemented  version  of  the  algorithm  has  been  evalu¬ 
ated  for  a  gaussian  channel  with  antipodal  modulation.  For 
instance  it  performed  quasi-complete  decoding  of  a  code  of 
length  200  and  dimension  100,  reaching  a  bit  error  rate  of 
10-6  at  3  dB  in  a  reasonable  time. 
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Abstract  -  In  this  paper  we  consider  the  special  case  of  the 
underdetermined  LS  problem  when  the  difference  between  the 
number  of  unknowns  and  the  number  of  equations  equals  one.  We 
propose  a  new  method  to  improve  the  minimum  norm  solution 
based  on  using  the  estimate  of  the  norm  of  the  exact  solution.  The 
method  is  applied  to  the  problem  of  syndrome  based  error  control 
image  coding  over  real  fields.  A  series  of  computer  simulations 
show  the  significant  gain  in  output  signal  to  noise  ratio  at  high 
bit  error  rate  in  the  channel. 


I.  A  NORM  ESTIMATE  BASED  APPROACH  TO  IMPROVING 
THE  MINIMUM  NORM  SOLUTION 


Consider  the  special  case  of  the  rank  N-l  underdetermined  LS 
system.  It  may  be  written  in  the  following  form 
Y/y-i  =A(W_i)xW  XN  (1) 

whereY^./  is  an  N-l  dimensional  vector  of  observations  and  A^. 
i)xN  is  an  (N-l)xN  coefficient  matrix. 

We  can  write  the  expression  for  the  squared  norm  of  the  vector  of 
N  unknowns  XN  in  terms  of  the  given  quantities  of  the  rank  N-l 
system  and  the  N-th  unknown  x,v  as 


\\XN\\  =a*N  +b\N  +c  (2) 

where  the  coefficients  a,  b,  and  c  are  defined  as  follows 

a  =  [“  A(AM)x(W-1)  A/v.am  ][“ 


a(t-i>(w-i)  ^-n.n- i  J  (3) 


a  =  |a 

Fa 


(AM)x(AM) 


-1 


IF 


(JV-l)x(W-l)  t\N,N- 1 


(AM)x(AM) 

-1 

(W-l)x(W-l)  «JV- 


i . f 

-i  J;(4) 


■=  A 


(Ar-l)x(W-l)  *W-lJ  a(7V-1)x(N-1)  yN-\ 


(5) 


By  solving  equation  (11)  we  can  write  the  expression  for  the 
unknown  Xjv.  in  terms  of  the  norm  of  the  vector  of  N  unknowns  Xjv 
and  the  coefficients  a,  b,  and  c  as 


a±/^-4a(C-||X„p 

2  a 


(6) 


Equation  (15)  gives  the  relation  between  one  unknown  and  the 
norm  of  the  exact  solution  to  the  full  rank  LS  system. 

Norm  estimation  can  be  considered  for  certain  conditions  such  as 
appearing  in  applications  in  image  channel  coding  over  the  real 
fields  [2].  In  fact  we  estimate  a  norm  ratio  (NR)  of  exact  norm  Ni 
and  minimum  norm  No  i.e.  Ni/Nq. 


n.  Real  number  bch  (4,2)  product  code 

In  [3],  real-number  codes  based  on  the  discrete  Fourier  transform 
(DFT)  are  defined.  It  is  important  for  decoding  that  the  last  c 
elements  of  the  error  vector  e  present  the  syndrome  vector  s  [4], 
Based  on  s,  error  signals  can  be  estimated  as  a  solution  to  the 
standard  least  squares  problem: 

e=A*s  (7) 

where  A  is  a  c  x  c  data  matrix. 

As  it  is  described  in  [2]  for  a  (4,2)  BCH  code,  almost  all  single 


errors  under  the  background  noise  can  be  detected  and  corrected 
based  on  a  syndrome  s=(si,S2).  To  deal  with  multiple  errors,  an 
approach  based  on  norm  estimation  discussed  in  a  preceding 
section  has  been  considered.  For  the  considered  case  of  real 
number  (4,2)  BCH  code  it  can  be  found  for  noise-free  case  the 
norm  ratio  (NR)  is 

a0  -  NRq  =  —  =  VL333  =1.1547 
A'o 

so  that  this  value  can  be  also  used  as  optimal  one  in  noisy  cases, 
too.  For  noisy  cases  the  ratio  a  is  random  variable.  In  addition,  a 
correlation  between  a  minimum  norm  No  and  norm  ratio  NRo  has 
been  considered  and  negative  correlation  was  identified  so  that  an 
adaptive  algorithm  for  a  can  be  performed 

m.  Experimental  results 

An  autoregressive  first  order  process  AR(1)  driven  by  a  Gaussian 
noise  has  been  quantized  with  xe  {0,1,.. .,255}  which  fits  data  such 
as  images  and  videos  quite  well.  Pair  of  symbols  are  coded  with 
(4,2)  BCH  code  [2]  and  transmitted  through  an  AWGN  channel. 
Based  on  Eq.  (7)  all  noise-free  and  single-error  cases  are  decoded 
based  on  syndrome  values  si  and  S2.  For  all  multiple  error  cases,  a 
MNS  solution  is  calculated  and  an  SNR0  defined  by 

SNRq  =  10  log[x/(x  -  x0)],[cffi]  where  x=(xi,X2)  is  a  data  vector 
and  Xo=(xio,X2o)  is  an  MNS  estimate. 

Simulations  performed  in  Matlab  show  a  significant  SNR  gain  for 
highly  correlated  sources.  For  a  constant  norm  ratio  a  =  1.1547 
SNR  gain  was  about  2.3dB.  For  an  adaptive  a  an  additional  gain 
of  about  1 ,5dB  is  obtained.  It  yields  in  total  a  gain  of  about  3.8dB. 

iv.  Conclusions 

In  this  paper  we  propose  a  new  method  to  improve  the  minimum 
norm  solution  to  the  special  case  of  the  underdetermined  LS 
problem  when  the  difference  between  the  number  of  unknowns 
and  the  number  of  equations  equals  one.  The  approach  is  based  on 
using  the  estimate  of  the  norm  of  the  exact  solution  to  the  full 
rank  system.  The  method  is  applied  to  the  problem  of  error  control 
over  real  fields  as  an  additional  algorithm  to  the  syndrome  based 
error  correction  in  multiple-error  cases.  Experimental  results  have 
shown  significant  gain  in  SNR. 
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Abstract  —  A  new  class  of  convolutional  codes, 
namely  generalized  woven  codes  with  outer 
warp  is  presented.  The  codes  are  based  on 
nested  classes  of  inner  single  convolutional  code 
and  many  outer  convolutional  codes  with  differ¬ 
ent  redundancies.  We  give  a  lower  bound  on 
free  distance  for  these  codes. 

I.  Introduction 

In  1997,  Host,  Johannesson  and  Zyablov  presented  wo¬ 
ven  convolutional  codes  [lj.  Here  we  extend  these  ideas 
and  construct  generalized  woven  convolutional  codes 
with  outer  warp  (GWOW).  We  use  nested  system  of  bi¬ 
nary  convolutional  codes  at  the  inner  stage  and  binary 
convolutional  codes  with  different  rates  as  outer  codes. 
The  nested  system  of  inner  codes  are  constructed  on 
the  basis  of  partitioning  of  convolutional  codes  into 
subcodes.  The  partitioning  principles  for  convolutional 
codes  has  been  described  in  [2,  3,  4]  and  can  be  found 
in  the  full  paper.  Furthermore,  we  extend  the  active 
distance  ideas  proposed  in  [5]  to  the  case  of  nested  sub¬ 
codes.  Based  on  these  results  we  determine  the  overall 
code  rate  and  give  the  lower  bound  on  the  free  distance 
for  GWOW  codes. 

II.  Generalized  Woven  Codes 

Figure  1  shows  the  encoder  of  the  proposed  general¬ 
ized  woven  codes  with  outer  warp.  As  the  inner  code 


First  outer  stage 


Figure  1:  Encoding  scheme  of  GWOW  codes 
a  kth  order  partitioned  convolutional  code  of  rate  R.b 


is  used.  We  have  A;th  outer  stages,  whereby  each  outer 
stage  comprises  of  l%\  j  =  1, 2, ...,  k  interleaved  par¬ 
allel  convolutional  codes  A^  \  i  —  1,2 In  each 
stage  all  outer  codes  have  the  same  rate  Rfjp  and  they 
determine  the  sequences  which  are  encoded  by  the 
inner  partitioned  convolutional  code.  The  overall  code 
Ta.te  R.Gwow  = -Rfl(E*=i 

The  partitioning  method  introduces  scrambler  matrices 
to  construct  suitable  equivalent  encoding  matrices  for 
kth  order  partitioning.  Therewith  we  obtain  increasing 
free  distances  dg\  j  =  1, 2, ...,  k  in  the  subcodes.  Fur¬ 
thermore,  we  also  investigate  the  active  row  distances 
of  subcodes.  In  general,  active  row  distances  of  the 
jth  subcode  can  be  lower  bounded  by  ar<-j\l)  > 

max  (a^^l  +  dfjp)  where  >  0,  €  R, 

l  —  0, 1, 2, .  Since  ar(j\l)  is  in  general  no  increasing 

function  we  define  ar^\l)  >  min;'>f  (ar^(Z))  which  is 
an  increasing  function. 

Theorem:  Let  >  lr^  where  Zr^  is  the  smallest  l 
for  which  cT’^(Z)  >  2 d^  holds.  Then  the  free  distance 
of  GWOW  code  is  lower  bounded  by 

d(GWOW)  >  (d™d%\...,d$d%\...J£>d%)). 

In  the  full  paper  with  the  help  of  examples  we  show 
that  GWOW  codes  compared  to  the  ordinary  woven 
convolutional  codes  achieve  larger  free  distances  and/or 
higher  code  rates. 
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Abstract  —  Woven  codes  with  outer  binary  block 
codes  and  additional  permutation  are  presented. 
This  enables  the  construction  of  a  new  class  of  woven 
block  codes,  where  the  minimum  distance  is  about 
twice  the  product  of  the  minimum  distances  of  the 
component  codes. 

I.  Introduction 

Woven  convolutional  codes  were  introduced  by  Host  et  al. 
in  [1],  In  this  paper  we  present  a  new  encoder  construction 
of  woven  codes  namely,  woven  codes  with  outer  binary  block 
codes  and  inner  recursive  convolutional  encoders.  We  show 
that  by  employing  designed  permutations  we  can  improve 
the  distance  properties  of  woven  block  codes.  Moreover,  in 
the  full  paper  first  simulation  results  for  woven  block  codes 
are  presented,  where  we  employ  outer  single-parity-check 
codes. 

II.  Woven  encoder 


Figure  1:  Woven  encoder  with  permutation. 


denote  the  minimum  j  for  which  j'min  =  min j{j  |  ab,i(j)  > 
2d}}  holds,  where  ab,'(j)  denotes  the  lower  bound  on  the  ac¬ 
tive  burst  distance  of  the  inner  encoder  [2].  If  we  do  not 
restrict  the  la  permutations  i r;(-)  we  obtain  the  following  re¬ 
sult. 


Theorem  1.  The  minimum  distance  of  the  woven  code  with 
1°  >  h'jmin  outer  block  codes  satisfies  the  following  inequal¬ 
ity: 


dw  >  d°d). 


(1) 


In  the  following  we  consider  designed  permutations.  Let 
p  =  N+ 1  be  prime.  We  perform  all  multiplications  in  GF(p). 
For  each  Ith  row  we  use  its  own  unique  permutation  ( i )  = 
iui,  i  €  {1, . . .  ,  N},  where!  e  {1,...  , l Each  ui  is  a  fixed 
element  of  GF(p)  which  satisfies  the  following  conditions: 


ut 


ut 

,-i 


|<5iui  -  S2Uj | 


> 

< 


2, 

N 

n°  —  1  ’ 


(2) 

(3) 


>  n°,  (4) 

>  3,  V  <5i,<52  e  {-n°  +  l,...  ,-1,1,...  ,n°  — 1} 

and  for  any  pair  l  ^  j ;  l,  j  6  {1, . . .  ,1°}.  (5) 


Theorem  2.  The  minimum  distance  of  the  woven  code  with 
1°  >  b'j'min  outer  block  codes  (each  encoded  according  to  for¬ 
mulas  (2)  -  (5))  satisfies: 


We  use  an  encoder  construction  with  1°  rows  of  outer  en¬ 
coders  (see  Figure  1),  where  we  apply  an  unique  permu¬ 
tation  in  each  row.  Each  information  sequence  uf  is  subdi¬ 
vided  into  M  short  blocks  of  length  k°.  Each  short  block 
is  encoded  with  the  same  generator  matrix  G°.  We  call 
a  codeword  encoded  by  G°  basic  codeword.  The  sequence 
vf  consists  of  M  basic  codewords,  each  of  length  n°,  i.e. 
N  —  Mn°  code  bits.  We  obtain  the  output  code  sequence 
v°  of  the  Ith  row  after  permuting  the  code  bits  of  vf.  Using 
an  N  x  N  matrix  P(  to  describe  the  row-wise  permutations 
we  may  express  the  encoding  of  the  1th  output  sequence  as 
vf  =  uf  (I m  ®  G°)  •  Pj,  where  1m  is  an  M  x  M  identity  ma¬ 
trix  and  ®  denotes  the  Kronecker  product.  A  permutation 
matrix  P /  is  a  non-singular  matrix  with  a  single  one  in  each 
row  and  each  column,  all  other  elements  are  zero.  In  the  fol¬ 
lowing  we  describe  a  permutation  as  a  function  7r(  ),  where 
7 n(i)  denotes  the  position  of  the  single  one  in  the  ith  column 
of  the  permutation  matrix  P( . 

III.  Permutation  Design 

Let  d°  and  d}  denote  the  minimum  distance  and  the  free 
distance  of  the  outer  and  inner  codes,  respectively.  Let  j'min 


dw  >  {2d°  -  l)d} .  (6) 

IV.  Example 

We  construct  a  woven  encoder  with  row-wise  permuta¬ 
tion,  where  we  use  G'(D)  =  (1,  Y+DfD? )  as  inner  generator 
matrix.  We  employ  1°  —  12  rows  of  single  parity  check  codes 

with  G°  =  ^  *  J  l  )  Each  row  consists  of  M  =  416 
basic  codewords.  For  the  permutations  we  use 

u,  e  {7, 10, 17,  23, 26, 29, 37, 40, 43, 49,  55, 61} 

which  satisfy  conditions  (2)  -  (5).  The  resulting  woven  code 
has  rate  R  =  1/3  and  dimension  K  =  9984.  With  the  mini¬ 
mum  distances  d°  =  2  and  d)  —  5  we  obtain  dw  >  15. 
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Abstract  —  A  new  family  of  binary  convolutional 
codes  is  introduced:  the  maximum  slope  (MS)  code 
family.  MS  codes  are  defined  such,  that  there  ex¬ 
ist  no  other  rate  R  —  b/c  binary  convolutional  code 
with  the  same  free  distance  d;  and  overall  constraint 
length  v,  whose  lower  bounds  on  the  active  distance 
family  exhibit  a  larger  slope.  Tables  for  the  rate 
R  =  1/2  maximum  slope  code  family  with  memory 
m  =  1, 2, . . .  ,  5  are  given.  Furthermore,  tables  for  new 
rate  R  =  (c  -  l)/c,  c  =  2, 3, . . .  ,5,  punctured  convolu¬ 
tional  codes  with  optimum  free  distance  codes  and 
MS  mother  codes  are  given. 

Simulation  results  for  woven  convolutional  codes 
with  MS  component  codes  are  presented.  It  is  shown, 
that  the  component  code  choice  makes  a  tradeoff  be¬ 
tween  dj  and  a. 

I.  Introduction 

The  active  distance  family  was  recently  introduced  in  [1].  It 
is  a  new  type  of  distance  measure  on  binary  convolutional 
codes.  For  example  the  active  burst  distance  a)  is  defined 
as  the  minimal  Hamming  weight  among  all  c-tuple  code 
sequences  of  length  j  that  start  and  terminate  in  the  all¬ 
zero  state  and  do  not  have  consecutive  all-zero  encoder  state 
transitions  associated  with  all-zero  input.  The  active  burst 
distance  determines  the  error  correcting  capability  of  the 
code.  All  other  active  distances  are  defined  in  the  same  man¬ 
ner,  but  for  different  sets  of  start  and  terminal  states.  In  [2] 
it  is  proven  that  asymptotically  in  j  the  minimum  weight 
code  sequence  follows  a  cycle  in  the  encoder  state  diagram. 
Hence,  the  minimum  average  weight  growth  is  given  by  the 
cycle  with  smallest  average  weight  and  the  active  distances 
are  lower  bounded  by  linear  increasing  functions  with  slope 
a.  Upper  and  lower  bounds  on  a  were  derived  in  [3]  and  [4]. 

In  [5,  6]  encoding  properties  and  decoding  aspects  of  wo¬ 
ven  convolutional  codes  are  discussed.  The  free  distance  and 
the  slope  of  the  component  codes  used  in  this  construction 
essentially  describe  the  woven  convolutional  code  active  dis¬ 
tances.  Furthermore,  it  is  shown  that  the  bit  error  rate  per¬ 
formance  of  woven  convolutional  codes  depend  strongly  on 
these  parameters. 

II.  Maximum  Slope  Codes 

The  computation  of  the  active  distances  is  realized  by  using 
transfer  function  methods  based  on  the  encoder  state  tran¬ 
sition  matrix.  An  effective  and  efficient  method  to  compute 
a  for  small  overall  constraint  lengths  is  presented.  Some 
rate  R  —  1/2  MS  codes  with  memory  m  =  2, . . .  ,5  are  given 
in  the  table  below. 


m 

dS 

a 

G(D) 

2 

4 

2/3 

(7  6) 

5 

1/2 

(7  5) 

3 

5 

4/7 

(15  14) 

6 

1/2 

(15  17) 

4 

7 

3/8 

(31  35) 

5 

7 

4/9 

(70  65) 

8 

275 

(76  65) 

III.  Simulation  Results 

The  following  figure  depicts  simulation  results  for  the  R  = 
1/4  terminated  woven  convolutional  codes  (Z0, 1).  On  the  left 
side  no  permutation  was  performed,  on  the  right  side  row 
wise  permutation  was  applied. 


The  bit  error  performance  of  serial  concatenated  Turbo 
codes  show  a  similar  behavior.  Hence,  the  slope  of  the  com¬ 
ponent  codes  is  an  important  design  parameter  for  serial 
concatenated  codes  with  additional  interleaving. 
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Abstract  —  An  iterative  decoding  scheme  for  woven 
convolutional  codes  is  presented.  It  is  called  pipeline 
decoding  and  operates  in  a  window  sliding  over  the 
received  sequence.  This  exploits  the  nature  of  convo¬ 
lutional  codes  as  sequences  and  suits  the  concept  of 
convolutional  encoding  and  decoding  as  a  continuous 
process.  The  pipeline  decoder  is  analyzed  in  terms  of 
decoding  delay  and  decoding  complexity. 

Additional  interleaving  for  woven  convolutional 
constructions  is  introduced  by  employing  a  convolu¬ 
tional  scrambler.  It  is  shown  that  some  types  of  inter¬ 
leaving  preserve  the  lower  bound  on  the  free  distance 
of  the  original  woven  construction. 

Simulation  results  for  woven  convolutional  codes 
are  presented. 

I.  Introduction 

In  [1,  2]  three  related  woven  constructions  were  introduced, 
viz., 

•  woven  convolutional  codes  with  outer  warp  ( l0 , 1), 

•  woven  convolutional  codes  with  inner  warp  (1,  If), 

•  the  twill  (l0,  U), 

where  the  (l0,h)  denotes  the  number  of  encoders  in  the 
outer  and  inner  warps,  respectively.  The  encoder  for  a  wo¬ 
ven  convolutional  code  is  represented  by  a  serial  concatena¬ 
tion  of  two  warps  both  consisting  of  a  set  of  parallel  convo¬ 
lutional  encoders,  see  Fig.  1.  If  l0  and  U  are  relatively  prime 
and  large  enough,  the  free  distance  of  the  woven  convolu¬ 
tional  code  satisfies 

JfVee  >  dfreed}ree,  (1) 

where  dfree  and  d'{ree  denote  the  free  distances  of  the  outer 
and  inner  component  code,  respectively. 


Fig.  1:  Encoder  for  the  twill. 


II.  Iterative  Decoding 

In  contrast  to  the  well  known  iterative  decoding  scheme  of 
serially  concatenated  truncated  convolutional  codes  [3,  4], 
the  presented  decoding  scheme,  called  pipeline  decoding,  op¬ 
erates  with  a  sliding  window  technique  over  the  received  se¬ 
quence.  For  the  symbol-by-symbol  a  posteriori  decoding  of 
the  inner  and  outer  component  codes  a  sliding  window  ver¬ 
sion  of  the  BCJR  algorithm  [5]  is  employed.  The  window  is 
separated  into  one  decision  window  of  size  Wd  and  one  delay 
window  of  size  wt,.  Based  on  the  sizes  of  these  windows  we 
analyze  the  decoding  delay  and  the  decoding  complexity  of 
the  W-BCJR,  as  well  as  that  of  the  pipeline  decoder.  Simu¬ 
lation  results  for  the  pipeline  decoder  are  presented. 

III.  Additional  Interleaving 

Additional  interleaving  can  significantly  improve  the  bit  er¬ 
ror  performance  at  low  signal  to  noise  ratios.  To  preserve 
the  convolutional  code  structure  of  the  overall  code,  we  use 
convolutional  scramblers  for  interleaving. 

It  is  shown,  that  the  woven  construction  can  apply  ad¬ 
ditional  random  interleaving  without  violating  the  lower 
bound  on  the  free  distance  of  the  original  construction  (1). 
Furthermore,  additional  interleaving  can  be  applied  to  re¬ 
duce  the  number  of  encoders,  l0  and  U,  while  the  lower 
bound  (1)  still  holds. 

IV.  Simulation  Results 

Simulation  results  show  that  terminated  woven  convolu¬ 
tional  codes  are  attractive  alternatives  to  both  parallel 
and  serial  concatenation  of  convolutional  codes,  e.g.,  Turbo 
codes. 
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Abstract  —  The  problem  of  predicting  the  next  out¬ 
come  of  an  individual  binary  sequence,  based  on  past 
observations  which  are  corrupted  by  arbitrarily  vary¬ 
ing  memoryless  additive  noise,  is  considered.  The  goal 
of  the  predictor  is  to  perform,  for  each  individual  se¬ 
quence,  “almost”  as  well  as  the  best  in  a  set  of  experts, 
where  performance  is  evaluated  using  a  general  loss 
function.  This  setting  is  a  generalization  of  the  o- 
riginal  problem  of  universal  prediction  of  individual 
sequences  relative  to  a  set  of  experts  (cf.,  e.g.,  [2]  and 
the  many  references  therein). 

I.  Introduction 

The  noise  model  considered  in  this  work  is  that  where  the  ob¬ 
servation  available  to  the  predictor  to  make  its  prediction  for 
time  t  is  the  vector  (yi, .  ■  ■ ,  y  t — i  ) ,  where  y;  =  xi  +  n,  x,  is 
the  clean  bit  at  time  t,  and  r  =  {rt,  t  >  1}  is  some  arbitrarily 
varying  memoryless  noise  process.  The  additive  noise  model 
considered  in  this  work  differs  from  the  binary-valued  noise 
model  considered  in  [1],  [3]- [5]  (where  the  observed  bit  is  the 
bitwise  XOR  of  the  clean  bit  and  the  noise  bit)  and  joins  it 
to  give  a  more  complete  picture  for  the  noisy  setting  [6],  It 
is  shown  that  even  in  this  noisy  environment,  when  the  infor¬ 
mation  available  regarding  the  past  sequence  is  incomplete,  a 
predictor  can  be  guaranteed  to  successfully  compete  with  a 
whole  set  of  prediction  schemes  in  considerably  strong  sens¬ 
es.  Furthermore,  these  performance  guarantees  are  valid  for 
a  very  large  family  of  noise  processes,  though  the  predictor 
itself  does  not  depend  on  the  statistical  characterization  of 
the  particular  active  noise  process  within  this  class.  In  other 
words,  it  is  twofold  universal  where,  in  this  context,  twofold 
universality  stands  for  universality  in  the  usual  sense  (w.r.t. 
the  experts  in  the  class  and  all  possible  sequences)  and  w.r.t. 
a  family  of  noise  distributions. 

II.  Statement  of  the  Problem  and  Main  Results 

Let  L  :  {0, 1}  x  [0, 1]  -4  [0,  oo]  be  a  fixed  loss  function.  A 
predictor  (or  an  expert)  F  =  {Ft}t>i  is  a  sequence  of  func¬ 
tions  where  Ft  :  Rt_1  -4  [0,1].  We  define  the  cumulative 
loss  of  the  predictor  F,  fed  by  yn  —  (yi, . . . ,  yn)  and  judged 

d  pf 

with  respect  to  xn  =  (x,\ , . . . ,  .rn)  G  {0,  l}n  by  Lf(y" ,xn)  = 
L(xt.  Ft(yt~1)).  We  consider  the  case  where  the  noisy 
observation  accessible  to  the  predictor,  y  =  (yi,  ya,  ■  •  ■)  G  R“ 
is  given  by  yt  —  xt  +  rt,  t  >  1,  where  r  =  {rt,t  >  1} 
is  a  zero-mean,  memoryless,  arbitrarily  varying  process:  for 
every  n,  the  p.d.f.  governing  rn  =  (jq  , . . . ,  r„)  is  of  the  form: 
/(r"|sn)  =  niU/(r,k.),  where  s"  E  S"  is  some  unknown 
arbitrary  sequence  of  states,  and  5,  is  some  abstract  state- 
space  such  that  for  all  a  G  S  we  have  J_ ^  r  •  f(r\a)dr  =  0. 

^he  research,  which  is  supported  by  the  Israeli  Science  Founda¬ 
tion,  is  part  of  the  D.Sc.  dissertation  of  the  first  author.  Both  au¬ 
thors  are  with  the  Department  of  Electrical  Engineering,  Technion- 
Israel  Institute  of  Technology,  Haifa  .32000,  Israel. 


Letting  Lf{xu)  —  ELp(yn,  xn)  denote  the  expected  loss  of 
F  when  the  underlying  individual  sequence  is  xn,  we  de¬ 
fine  the  worst-case  relative  expected  loss  of  a  predictor  P  by 
Rn(P,  F)  —  niaXjn^jQ  ip  (Lp(xn)  —  inf  fgjf  Lf(x")).  It  is 
shown  that,  for  a  large  class  of  loss  functions,  for  any  fi¬ 
nite  set  of  experts  F,  there  exists  a  predictor  P  such  that 
Rn(P,  F)  —  0((ln  n)2  -  In  \F\),  while  for  another  class  of  loss 
functions  we  have  R„(P,F)  =  0{sjn\n  \F\). 

Further  results  show,  however,  that  the  prediction  strate¬ 
gies  that  we  suggest  are  guaranteed  to  be  doing  well  in  con¬ 
siderably  stronger  senses.  It  is  shown  that  under  some  mild 
additional  conditions  on  the  noise  process,  the  predictor  P 
satisfies 


limsup  <c  a.s.  V,e{0.1}-, 


for  some  deterministic  constant  c.  It  is  further  shown  that, 
using  this  same  predictor,  we  also  have 


< 


max  Pr{—  [Lp(yn ,xn)  —  min  Z-F(yn,  ®")]  >  e) 
*"€{o,i}"  n  T 

exp{  —  n(I(e,  B)  +  o(n))},- 


where,  I(e,B)  >  0,  which  lower  bounds  the  possible  expo¬ 
nential  rate  of  the  decay,  is  independent  of  the  expert  class 
F . 

The  remarkable  feature  of  the  predictors  that  we  employ 
is  the  strong  sense  in  which  they  are  twofold  universal.  The 
above  described  performance  bounds  hold  with  the  same  uni¬ 
versal  predictor  P,  regardless  of  the  particular  state  sequence 
driving  the  noise  process. 
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Abstract  —  We  investigate  on-line  prediction  of  indi¬ 
vidual  sequences.  Given  a  class  of  predictors,  the  goal 
is  to  predict  as  well  as  the  best  predictor  in  the  class, 
where  the  loss  is  measured  by  the  self  information 
(logarithmic)  loss  function.  The  excess  loss  (regret) 
is  closely  related  to  the  redundancy  of  the  associated 
lossless  universal  code.  Using  Shtarkov’s  theorem  [3] 
and  tools  from  empirical  process  theory,  we  prove  a 
general  upper  bound  on  the  best  possible  (minimax) 
regret.  The  bound  depends  on  certain  metric  proper¬ 
ties  of  the  class  of  predictors  and  is  applicable  to  both 
parametric  and  nonparametric  classes  of  predictors. 

I.  Summary 

Assume  that  elements  of  an  arbitrary  sequence  yi, . . .  ,  yn 
are  revealed  one  by  one,  where  the  elements  yt  belong  to  some 
set  measurably  y.  At  each  time  t  =  1, . . .  ,  n,  before  revealing 
an  element  yt,  we  are  asked  to  assign  a  probability  density 
pt  on  y  and  then  observe  yt  incurring  the  logarithmic  loss 
—  In  pt(yt)-  Our  total  loss  at  the  end  is  the  sum  of  the  losses 
suffered  at  each  round.  As  we  know  the  prefix  yi,...  ,yt-i 
before  choosing  each  probability  assignment  pt,  we  may  view 
each  pt  as  the  conditional  p{ ■  |  j/i,...  ,2/t-i)  of  some  joint 
distribution  p  that  we  choose  before  the  game  begins.  We  call 
p  a  prediction  strategy.  Any  strategy  for  playing  this  game  is 
equivalent  to  a  probability  distribution  on  V"  • 

Our  goal  is  to  predict  (almost)  as  well  as  the  best  strategy 
in  a  given  “reference”  set  of  strategies.  We  call  “experts”  the 
strategies  in  the  reference  set.  In  other  words,  we  intend  to 
accumulate  a  loss  not  much  larger  than  that  of  the  best  expert, 
regardless  of  what  the  sequence  y i , . . .  ,yn  might  be. 

In  this  paper  we  investigate  the  minimum  excess  loss,  with 
respect  to  the  total  loss  of  the  best  expert,  achievable  on  any 
sequence.  This  quantity,  known  as  minimax  regret  (under 
logarithmic  loss),  turns  out  to  depend  on  certain  metric  prop¬ 
erties  of  the  class  T  of  experts. 

It  is  well-known  that  every  sequential  prediction  strategy 
may  be  converted  into  a  sequential  lossless  source  code.  Con¬ 
versely,  every  uniquely  decodable  code  over  yn  defines  a  prob¬ 
ability  distribution.  Thus,  the  prediction  problem  under  loga¬ 
rithmic  loss  is  formally  equivalent  to  the  problem  of  sequential 
universal  coding  in  data  compression.  In  this  context,  the  sub¬ 
ject  of  our  study  is  the  smallest  achievable  worst-case  redun¬ 
dancy  of  a  sequential  lossless  code,  with  respect  to  a  general 
class  of  reference  codes. 

This  research  was  supported  in  part  by  ESPRIT  Working 
Group  EP  27150,  Neural  and  Computational  Learning  II  (Neuro- 
COLT  II)  and  DGES  grant  PB96-0300.  The  first  author  was  also 
partially  supported  by  MURST  project  “Modelli  di  calcolo  innova¬ 
tive  metodi  sintattici  e  combinatori”. 


Fix  a  class  T  of  “reference”  strategies,  called  here  experts. 
The  worst-case  regret  of  a  strategy  p  (with  respect  to  the  class 
T)  is  defined  by 


Rn{p,T)  =  sup  In 

si" 


sup^  f(yn) 
p(yn) 


In  other  words,  Rn(p,F)  is  the  worst-case  difference  between 
the  log-likelihood  of  y"  under  the  density  p  and  the  log- 
likelihood  of  y"  under  its  maximum  likelihood  estimator  in 
the  class  T .  The  smallest  worst-case  regret  achievable  by  any 
predictor  is  the  minimax  regret 

Rn(T)  =  inf  sup  In  — P  t(yn) 

p  v"  p{yn) 

where  the  infimum  is  taken  over  all  densities  p  on  yn. 

To  any  class  T  of  experts,  we  associate  the  metric  d  defined 
by 


d(f,9 ) 


n 


1 


^sup(ln/(yt|yt-1)  -lny(y(|yt-1))2  . 
t=i  v‘ 


(1) 


We  use  N(T,  e)  to  denote  the  e-covering  number  of  T  under 
the  metric  d,  that  is,  the  cardinality  of  the  smallest  subset 
T'  C  T  such  that 


(V/  e  T)(3g  e  P)  d(f,g)<e. 

Our  main  result  is  the  following: 

Theorem  1  For  any  class  T  of  experts, 

Rn  (T)  <  inf  ^ln  N(F,  e)  +  24  J‘  ^/ln  N(F,  6)  dS 

The  theorem  improves  on  previous  general  results  in  [1]  and 
[2] .  We  may  use  the  theorem  to  obtain  tight  upper  bounds  for 
Rn(F)  for  both  parametric  and  nonparametric  classes.  For 
example,  for  parametric  classes  we  obtain: 


Corollary  1  Assume  that  there  exist  positive  constants  k  and 
c  such  that  for  all  e  >  0,  In  N(fF,  e)  <  k  In  .  Then 

Rn  (IF)  <  ^  In  n  +  o(ln  n)  . 
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Abstract  —  We  address  the  problem  of  filtering  and 
prediction  of  an  individual  binary  sequence  based  on 
its  noisy  past,  as  an  extension  to  [1].  The  perfor¬ 
mance  criterion  investigated  is  the  expected  fraction 
of  errors.  We  propose  algorithms  and  compare  their 
performance  to  that  of  the  best  finite  state  machine 
(FSM).  We  improve  on  previous  results  [1]  by  show¬ 
ing  that  optimum  performance  can  be  achieved  by 
Lempel-Ziv-based  estimation  algorithms. 

I.  Introduction 

Let  0i,02,.  ..  be  an  arbitrary  binary  sequence  corrupted  by 
a  Bernoulli  noise  process  vi,ua,...  with  Pr{vi  =  1}  =  p. 
An  observer  accesses  the  noisy  sequence  y  1,2/2,...,  where 
yt  =  0i  ®  i/j,  and  ®  denotes  addition  modulo  2.  The  observer 
is  interested  in  either  estimating  0;  (filtering),  or  predicting 
0i+i  (prediction),  based  on  yi,  2/2,  •  •  • ,  J/i-  We  seek  a  universal 
estimator  whose  bit  error  probability  is  essentially  as  small  as 
that  of  the  best  FSM,  simultaneously  for  all  0.  Previous  work 
[3], [2]  can  be  viewed  as  a  special  case  of  this  filtering  prob¬ 
lem,  where  in  [2]  it  was  shown  that  there  exists  a  sequential 
estimator  whose  asymptotic  performance  is  as  good  as  that 
of  the  best  estimator  that  is  implementable  by  a  single-state 
machine.  In  [2],  prediction  without  noise  was  considered  and 
a  sequential  LZ-based  predictor  was  shown  to  attain  the  fi¬ 
nite  state  predictability  of  all  infinite  sequences.  In  this  work, 
we  improve  on  previous  results  of  [1]  where  an  asymptotically 
optimum  sequential  algorithm  with  growing  memory  was  in¬ 
troduced.  In  this  work  we  present  a  more  practical,  LZ-based 
algorithm  that  achieves  the  same  goal. 

A  finite-state  filter  (FSF)  with  S  states  is  a  causal  device 
that,  upon  receiving  a  sequence  of  observations  yi ,  2/2 ,  -  •  gen¬ 
erates  a  sequence  of  estimates  0i,  02, . .  •  ,  while  going  through 
a  sequence  of  states  Si,  S2,  ■  ■  ■  that  take  on  values  in  a  finite 
set  S  =  (1,  2, . . . ,  S}.  The  mechanism  of  the  FSF  is  defined 
by  a  pair  of  deterministic  functions  /  and  g,  where  /  is  the 
output  function  that  is  given  by  8i  =  f(si,yi)  for  filtering 
and  0i+i  =  f{si)  for  prediction,  and  g  is  the  next-state  func¬ 
tion  that  defines  a  recursive  state  update  rule,  according  to 
Si+i  =  y(s;,  yi).  Let  Gs  be  the  set  of  all  next-state  functions 
of  no  more  than  5  states.  Henceforth,  xj  ,  i  <  j,  generically 
designates  (xi , Xi+i , ...  ,xj).  Also,  denote  by  gk  the  fc-th  order 
Markovian  next-state  function  whose  state  at  time  instant  t  is 
defined  by  st  =  y\Zlk-  For  a  given  (/,  g),  let  e  (0",  v",  (/,  g))  = 
n  £"=1  the  fraction  of  errors  attained  when  (/,  g) 

is  applied  to  y”.  Let  es(0")  =  min/  E  {e  ($T,vi,  (/,  9))}  , 
and  define  the  FS  filterability  of  an  infinite  sequence  0  by 
e(0)  =  lims-,oo limn-Kx> min9gGs  e9(0?)-  The  aperiodic  FS 
filterability ,  e(0),  is  defined  similarly,  with  the  exception 

1This  work  was  supported  by  the  ISF  administered  by  the  Israeli 
Academy  of  Sciences  and  Humanities. 


that  the  minimization  is  over  the  class  of  aperiodic  ma¬ 
chines,  and  the  Markovian  filterability,  /j(0),  is  defined  by 
limjc  — too  limn_>oo  e9k  (0j ). 

II.  Main  Results 

Our  main  result  is  a  derivation  of  a  scheme  that  asymptotically 
achieves  e(0).  This  scheme  is  based  on  the  incremental  parsing 
(IP)  procedure  of  the  LZ  78  algorithm,  and  can  be  viewed 
as  a  Markovian  machine  of  increasing  order.  The  transition 
between  states  is  identical  to  that  of  the  equivalent  scheme  in 
[2],  apart  of  the  fact  that  it  is  the  noisy  sequence  {yt}  which 
determines  the  states  sequence  rather  than  the  clean  one.  The 
state  at  time  instant  t  is  the  string  of  bits  observed  since  the 
last  phrase  has  terminated. 

The  estimation  is  as  follows:  denote  by  Nf  (s,x)  = 
S*=i  l{3,=s,yi+i=x}  the  joint  count  of  state  s  and  the  value 
of  the  next  noisy  bit  being  x,  and  let  Nf(s,x)  be  a  random¬ 
ized  version  of  it,  i.e.,  N?(s,x)  =  N?(s,x)  +  zx(t)  ( Nt(s))1/ 2 
where  {z*(<)}”=1  ,x  €  (0, 1},  are  independent  r.v.’s  uniformly 
distributed  over  the  interval  [0, 1]  and  Nt(s)  =  ^*=1  l{s;=s}- 
Let  Nt(s,x)  =  X^=i  l{si=s,0i=*}  be  the  joint  count  of 
state  s  and  the  value  of  the  current  clean  bit  being  x. 
Now,  an  estimation  of  N?(s,x)  is  performed:  N?(s,x)  = 
1^2?  Si=i  =•»,«=*}  -  I and  sonm  auxiliary 
randomization  is  introduced  which  results  in  N f  ( s ,  x)  = 
Nf(s,x)  +  zx(t)(Nt(s))1/2. 

For  prediction,  the  decision  rule  is  9t+ 1  =  x  if  Nf(st,x)  > 
Njl(st,\  —  x).  For  filtering,  the  decision  rule  is  0t  =  x  if 
Nt(st,x)  >  1  -  x),  and  otherwise  0t  =  yt,  where 

ties  are  broken  arbitrarily.  Denote  by  e,p  (0" )  the  expected 
fraction  of  errors  made  by  this  scheme. 

Our  first  result  is  that,  when  prediction  is  concerned 

eslW)  -  minggG5  e3(0" )  <  and  therefore 

p(6)  =  e(0).  When  both  filtering  and  prediction  are  con¬ 
cerned  We  Show  that  egk  (0" )  —  min9f=G,  ,g  aperiodic  eg(9i )  < 

0(a(S,p))fc  +  £  where  |a(5,p)|  <  1  which  implies  that  /i(0)  < 
e(0)  for  filtering.  We  further  show  that  eIP($i)  —  e9fc(0?)  < 

O  ^  ^  +  O  (5^7)-  Combining  these  two  observations 

it  follows  that  the  above  described  scheme  achieves  the  FS 
filterability. 
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Abstract  —  An  iterative  algorithm  is  presented  for 
joint  equalization  and  decoding  of  data  that  has  been 
transmitted  over  intersymbol  interference  (ISI)  chan¬ 
nels,  This  differs  from  well-known  “turbo  equaliza¬ 
tion”  (TEQ)  methods,  in  that  the  ISI  is  removed 
with  a  soft-input  soft-output  (SISO)  equalizer  via 
linear  or  decision  feedback  equalization  (DFE).  The 
data  is  encoded  with  a  convolutional  code  and  inter¬ 
leaved  prior  to  transmission  over  the  channel.  At 
the  receiver,  symbol  estimates  are  successively  re¬ 
fined  by  passing  extrinsic  information,  in  the  form  of 
priors  over  the  symbols,  between  the  SISO  equalizer 
and  a  SISO  decoder  based  on  maximum-aposteriori- 
probability  (MAP)  symbol  estimation.  The  low  com¬ 
plexity  of  this  algorithm  make  it  a  practical  alterna¬ 
tive  to  existing  methods,  without  sacrificing  bit  error 
rate  (BER)  performance. 

I.  Introduction 

Data  transmission  over  ISI  channels  is  a  classical  problem 
in  communication  scenarios.  Conventional  approaches  imple¬ 
ment  an  equalizer  to  remove  ISI  or  use  MAP  or  maximum  like¬ 
lihood  (ML)  detection.  Data  reliability  can  be  enhanced  using 
coding,  where  the  data  is  encoded  in  the  transmitter  prior  to 
transmission.  For  reasons  of  complexity,  the  receiver  then  typ¬ 
ically  performs  separate  equalization  and  decoding  of  the  data. 
Significant  performance  gains  can  be  achieved  through  joint 
equalization  and  decoding  at  the  cost  of  added  complexity.  A 
recent  approach  that  significantly  reduces  the  complexity  of 
joint  equalization  and  decoding  is  the  so-called  “turbo  equal¬ 
ization”  algorithm,  where  MAP/ML  detection  and  decoding 
are  performed  iteratively  on  the  same  set  of  received  data 
[4,  5].  It  has  recently  been  shown  that  passing  soft  informa¬ 
tion,  the  use  of  interleaving,  and  the  controlled  feedback  of  soft 
information  are  essential  requirements  to  achieve  performance 
gains  with  an  iterative  system  [1].  Various  algorithms  simi¬ 
lar  to  TEQ  have  been  proposed  to  overcome  the  complexity 
of  the  MAP/ML  algorithms,  especially  in  the  detector,  where 
complexity  is  exponential  in  the  channel  delay  spread  [2,  3]. 

An  algorithm  that  is  a  practical  alternative  to  turbo  equal¬ 
ization  is  presented  in  this  paper.  In  an  approach  similar  to 
that  of  Wang  and  Poor  [2],  the  MAP/ML  detector  in  the  TEQ 
setup  is  replaced  by  a  linear  equalizer  (LE)  or  DFE.  The  filter 
coefficients  are  selected  according  to  a  minimum  mean  squared 
error  criterion  (MMSE),  taken  over  both  the  statistics  of  the 
noise  and  the  prior  over  the  symbols. 

II.  Concepts 

A  block  diagram  of  the  data  transmission  system  is  shown  in 
Figure  1 .  In  the  receiver,  the  SISO  equalizer  and  SISO  decoder 
exchange  priors  over  the  possible  values  of  each  code  symbol 
Cn.  The  SISO  equalizer  consists  of  an  estimator,  providing 

’This  work  was  supported  by  NSF  Grant  CCR  99-79381. 
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Figure  1:  Data  Transmission  System 


the  estimates  r„  of  the  transmitted  symbols  xn,  followed  by 
a  mapping  that  transforms  xn  to  a  prior  over  the  transmitted 
symbol  at  time  n.  The  SISO  decoder  uses  this  soft  information 
to  decode  the  data  and  produce  an  additional  prior  over  the 
symbols,  which  can  be  interpreted  as  soft  feedback  information 
for  the  equalizer.  The  SISO  equalizer  minimizes  the  MMSE 
cost  function  E{ |x„  — fn|2}  using  the  time  varying  statistics 
E{x„)  and  Cov{xn: r^},  which  are  computed  for  each  received 
symbol  using  the  soft  feedback  information  [1], 

For  the  SISO  equalizer,  a  time-recursive  update  algorithm 
with  0(N2+M2)  (exact  implementation)  and  0(N-\-M)  (ap¬ 
proximate)  complexity  per  received  symbol  and  iteration  was 
developed  [1],  where  M  is  the  ISI  channel  length  and  N  the 
length  of  the  equalization  filter.  Both  implementations  yield 
significant  savings  in  the  computational  complexity  compared 
to  MAP/ML-based  detectors  with  0(qM)  complexity,  where 
q  is  the  size  of  the  alphabet  of  the  transmitted  symbols  x„. 

III.  Results 

From  the  set  of  possible  equalizer  implementations,  the  exact 
implementation  of  the  LE- based  SISO  equalizer  performs  best 
in  terms  of  BER  and  can  match  or  beat  the  performance  of 
the  approach  in  [3]  and  even  the  MAP-based  TEQU  approach 
in  [5].  The  DFE-bascd  solutions  are  shown  to  perform  worse 
than  LE-based  solutions  [1],  The  performance  improvements 
of  the  proposed  algorithm  over  that  of  the  TEQ  approach,  for 
certain  ISI  channels  and  data  block  lengths,  demonstrates  that 
BER.-optimum  SISO  receiver  elements  (detector,  decoder)  are 
not  necessarily  optimum  in  an  iterative  setup  [lj. 
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Abstract  —  We  propose  to  compute  the  distance 
spectrum  of  arbitrary  trellis  codes  (including  convo¬ 
lutional  codes,  trellis-coded  modulation,  continuous 
phase  modulation,  etc.)  and  intersymbol-interference 
(ISI)  channels  by  means  of  a  modified  list  Viterbi  al¬ 
gorithm  (LVA).  This  search  procedure  is  (i)  compu¬ 
tationally  efficient,  (ii)  is  applicable  to  linear  as  well 
as  nonlinear  codes,  (iii)  can  be  applied  to  arbitrary 
distance  measures,  (iv)  can  be  used  for  MLSE  as  well 
as  RSSE  or  related  techniques,  and  (v)  guarantees 
that  an  ordered  list  of  the  N  nearest  error  paths  is 
produced.  A  sample  results  illustrates  the  distance 
spectra  of  linear  ISI  channels,  both  for  MLSE  and 
ideal  RSSE  receivers. 

I.  Introduction 

Prior  solutions  to  compute  the  free  distance  of  nonlinear  codes 
include  sequential  algorithms,  the  Viterbi  algorithm,  and  the 
Dijkstra  algorithm.  Solutions  to  compute  the  distance  spec¬ 
trum  include  sequential  algorithms,  transfer  function  meth¬ 
ods,  and  a  modified  Viterbi  algorithm  with  state-splitting  and 
multiple  passes,  among  other  techniques.  For  special  applica¬ 
tions  and  particularly  in  the  case  of  linear  codes  extensive 
simplifications  are  possible. 

In  the  present  paper,  we  propose  to  apply  a  modified  LVA 
for  the  purpose  of  computing  the  distance  spectrum  of  arbi¬ 
trary  trellises.  LVAs  compute  an  ordered  list  of  the  N  best 
paths.  Serial  and  parallel  LVAs  have  extensively  been  inves¬ 
tigated  in  [1]  and  the  references  therein  in  the  context  of  de¬ 
coding  and  related  applications,  but,  to  our  best  knowledge, 
not  for  distance  calculations. 

II.  Distance  Calculation  using  an  LVA 

Throughout  this  paper,  we  assume  the  existence  of  a  trellis 
with  a  finite  number  of  states.  We  consider  a  linear  code  first. 
In  order  to  compute  the  distance  spectrum  with  N  error  paths, 
it  is  sufficient  to  design  a  modified  LVA  for  the  original  trellis 
taking  the  N  best  survivors  into  account,  and  to  apply  this 
LVA  given  noise-free  channel  outputs.  An  ordered  list  of  the  N 
nearest  error  paths  is  produced,  if  the  following  modifications 
are  done: 

1.  All  error  paths  taken  into  account  must  diverge  from 
the  transmitted  sequence  at  time  k  =  0  and  re-merge 
at  time  k'  >  0.  All  other  paths  must  be  excluded,  par¬ 
ticularly  the  ML  path  and  all  paths  that  diverge  more 
than  once  from  the  transmitted  path. 

i 

2.  Instead  of  outputting  the  N  most  likely  information  se¬ 
quences  [1],  we  output  the  accumulated  path  metrics 
(i.e.,  the  distances)  and  the  corresponding  path  weight 
(multiplicity)  and/or  information  weight  Cd- 

Without  loss  of  generality,  the  transmitted  sequence  may  be 
the  all-zero  sequence.  Then,  the  all-zero  path  (i.e.,  the  ML 
path)  may  be  eliminated  by  setting  the  N  accumulated  dis¬ 
tances  of  the  all-zero  state  at  the  second  interval  of  the  trellis 
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to  infinity.  The  number  of  spectral  lines  is  less  than  the  num¬ 
ber  of  error  paths  N  actually  computed,  when  the  multiplicity 
ad  >  1  for  at  least  one  distance  d.  Whether  a  serial  or  a  par¬ 
allel  type  of  LVA  should  be  accomplished  depends  on  memory 
and  complexity  constraints,  among  others.  If  the  trellis  is  of 
finite  length,  the  LVA  may  operate  on  the  full  trellis,  otherwise 
a  stop  criterion  must  be  applied. 

These  general  design  criteria  also  hold  for  nonlinear  codes, 
which  are  discussed  next.  In  the  general  case,  we  design  the 
LVA  to  operate  on  the  product  trellis  in  order  to  take  all  error 
events  into  account.  If  the  error  events  depend  on  difference 
symbols  only,  we  may  use  the  difference  trellis  instead.  This 
is  the  case  of  linear  ISI  channels  and  CPM,  e.g.  In  any  case 
the  symmetry  of  the  error  states  has  to  be  taken  into  account, 
either  by  eliminating  redundant  error  events  or  by  reducing 
the  number  of  states. 

For  illustration,  consider  a  time-invariant  linear  ISI  chan¬ 
nel  with  binary  inputs  a*  €  {±1}  and  channel  coefficients  hi, 
0  <  l  <  L.  The  difference  symbols,  dk  =  ak  —  a,k,  take  the 
values  (—2,  0,  +2).  In  case  of  MLSE,  the  difference  trellis  has 
31,  states,  whereas  the  original  trellis  has  2l  states.  However, 
due  to  the  symmetry  of  the  error  states,  we  can  use  an  equiv¬ 
alent  difference  trellis  that  has  only  ( 3L  + 1)/2  states  Without 
loss  of  generality,  we  may  assume  that  the  all-zero  difference 
sequence  has  been  transmitted.  A  new  spectral  line  is  com¬ 
puted  whenever  an  error  path  re-merges  the  all-zero  difference 
path.  In  case  of  reduced-state  sequence  estimation  (RSSE), 
the  original  trellis  has  2K  states,  where  0  <  K  <  L.  A  new 
spectral  line  is  computed  whenever  an  error  path  merges  in 
one  of  the  ( 3L~K  4-  l)/2  hyper  states.  Otherwise,  the  search 
algorithm  is  the  same  as  described  above. 

Fig.  1  shows  the  (truncated)  distance  spectrum  for  an  ISI 
channel  given  MLSE  and  ideal  RSSE.  (Ideal  RSSE  does  not 
take  error  propagation  into  account.)  The  information  weight 
is  moderate  for  error  paths  with  small  distance,  whereas  larger 
spectral  lines  have  a  larger  multiplicity  and  are  less  spread  out. 


Fig.  1:  Distance  spectrum  for  MLSE  and  ideal  RSSE  for  a  binary, 
linear,  time-invariant  ISI  channel  with  32  states. 
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Abstract  —  For  the  Gaussian  channel  with  intersymbol- 
interference  (ISI),  it  is  known  that  there  is  no  loss  in  channel 
capacity  if  the  receiver  is  an  ideal  minimum  mean-squared  er¬ 
ror  (MMSE)  decision-feedback  equalizer  (DFE)  with  error-free 
feedback.  However,  combining  the  DFE  with  channel  coding  is 
problematic.  Transmitter  precoding  and  reduced-state  sequence 
estimation  are  two  common  approaches  (cf.  [1]  and  references 
therein).  This  paper  introduces  a  new  successively-decodable 
coding  technique  that  effectively  combines  channel  coding  with 
decision-feedback  that  is  housed  in  the  receiver. 

I.  The  Channel  Model 

Consider  the  real-valued  discrete-time  Gaussian  channel  with  inter¬ 
symbol  interference  (ISI)  represented  by 


M—  1 

Vk  —  hjXlc-j  T  Tiki  (1) 

3=0 

where  {xk}  is  a  sequence  of  zero-mean,  independent  identically- 
distributed  (i.i.d.)  transmitted  symbols  with  power  E[xl]  =  p, 
{/ifclji-o1  is  the  finite-tap  discrete-time,  post-cursor  channel  re¬ 
sponse,  and  {n*}  is  an  i.i.d.  sequence  of  zero-mean  Gaussian  noise 
samples  with  variance  E[n\]  —  a2.  The  average  mutual  information 
of  the  channel  (bits  per  channel  use)  is  maximized  when  the  symbol 
distributions  are  zero-mean  Gaussian  random  variables  with  power  p. 

II.  Successively  Decodable  Coding  Technique 

We  describe  the  two-level  successive  decoder  for  the  ISI  channel 
from  which  the  corresponding  coding  technique  is  easily  inferred. 
Begin  by  blocking  the  channel  output  sequence  into  vectors  of  length 
L.  We  view  this  vector  output  sequence  as  N  distinct  vector  channels, 
the  n-th  of  which  is  given  by 


/  r 

1T\°° 

[  U(kN+n-l)L  +  l  ' 

'•  V(kN+n)L  1  2 

)  k  =  —  oo 

Note  that  yk  is  statistically  independent  of  yk-M ■  Therefore,  if 
N  >  |"  M±Lni~|  j  ^en  the  output  sequence  of  the  n-th  channel  is 
the  output  sequence  of  a  memoryless  vector  channel.  Thus,  we  have 
decomposed  the  ISI  channel  into  N  memoryless  vector  channels  that 
are  statistically  related  to  each  other. 

Outer-level  coding  allows  the  N  vector  channels  to  be  decoded 
one  at  a  time,  starting  with  channel  1  and  ending  with  channel  N. 
If,  when  decoding  the  n-th  channel,  we  make  use  of  symbol  deci¬ 
sions  from  the  channels  that  have  already  been  decoded  (i.  e.,  vector 
channels  1  through  n  —  1),  we  refer  to  this  as  inter-channel  feedback. 
Clearly,  the  potential  advantage  of  inter-channel  feedback  increases 
with  N,  the  number  of  vector  channels. 

Inner-level  coding  addresses  each  vector  channel  by  viewing  it 
as  consisting  of  L  scalar  sub-channels  that  are  successively  decoded 

'This  work  was  supported  by  NSF  grant  NCR-9725778  and  the  Colorado 
Center  for  Information  Storage,  University  of  Colorado,  Boulder,  CO  80309. 


with  single-user  coders  and  decoders.  If,  when  decoding  the  Z-th  sub¬ 
channel  of  a  particular  vector  channel,  we  make  use  of  symbol  deci¬ 
sions  from  sub-channels  that  have  already  been  decoded  (i.  e.,  sub¬ 
channels  1  through  Z  —  1),  we  refer  to  this  as  intra-channel  feedback. 
Since  this  vector  channel  can  be  cast  as  a  memoryless  multiple-access 
channel,  the  optimal  successive-decoding  technique  developed  in  [2] 
can  be  implemented.  For  any  given  vector  channel,  performance  po¬ 
tential  will  improve  as  L,  the  block  size,  increases. 

Hence,  the  original  ISI  channel  is  treated  as  a  composition  of  NL 
sub-channels  which  are  to  be  coded  and  decoded  successively  using 
single-user  codes,  with  or  without  inter-  and  intra-channel  feedback. 

III.  Example 

Consider  the  response  of  the  2  kft-AWG26  channel  (i.  e.,  ho  =  1, 
hi  =  -0.6,  h2  =  -0.15,  h3  =  -0.12,  h4  =  -0.05,  hs  =  0.00, 
and  he  -  0.05)  [1],  which  is  operating  at  a  coded-symbol  signal- 
to-  noise  ratio  of  p/a 2  =  18.0  dB.  The  following  figure  compares 
the  theoretical  rate  of  information  transmission  for  several  schemes. 
The  average  mutual  information  is  plotted  as  a  function  of  the  total 


number  of  sub-channels,  NL.  It  is  evident  that  increasing  the  vector 
length  L  can  provide  substantial  gains  for  each  scheme  presented  and 
that  there  is  an  advantage  in  implementing  inter-channel  feedback  in 
addition  to  intra-channel  feedback. 
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Abstract  —  In  this  paper  partitioning  for  the 
SA(B,C)  algorithm  on  intersymbol  interference  (ISI) 
channels  is  considered.  Substantial  savings  in  com¬ 
plexity  can  be  made  by  using  the  SA (B,C),  while 
achieving  almost  optimal  error  performance. 

I.  Introduction 

A  receiver  that  uses  the  SA(S,C')  algorithm  [2]  for  intersym¬ 
bol  interference  (ISI)  additive  white  gaussian  noise  (AWGN) 
channels  is  considered.  SA  stands  for  Search  Algorithm.  The 
SA(jB,C)  partitions  the  states  in  the  trellis  into  C  state  classes. 
Then  proceeding  breadth  first  in  the  trellis,  the  detector  se¬ 
lects  B  paths  closest  to  the  received  signal  for  each  state  class. 
The  number  of  computations  per  released  symbol,  which  is  to 
be  minimized,  is  proportional  to  BC,  the  number  of  paths 
traced.  The  SA(B,C)  family  of  algorithms  perform  maximum 
likelihood  sequence  detection  (MLSD)  under  given  structural 
and  complexity  constraints  [2].  The  Viterbi  algorithm  (VA)  [1] 
performs  complexity  unconstrained  MLSD.  The  performance 
of  the  SA (B,C)  detector  is  here  required  to  be  asymptotically 
optimal  (AO)  [2],  [3],  i.e.  the  error  event  probability  should 
approach  unconstrained  MLSD  when  the  signal  to  noise  ratio 
(SNR)  — ►  oo.  Given  the  parameter  B,  there  will  be  constraints 
on  how  to  construct  the  partition  i.e.,  which  states  that  can 
belong  to  the  same  state  class.  To  find  an  optimum  partition 
is  in  its  general  form  an  NP-hard  problem,  e.g.,  when  B  —  1 
the  problem  is  equivalent  to  the  graph  coloring  problem  [3]. 
Here  the  size  of  the  problem  of  finding  a  partition  is  limited 
by  imposing  structural  constraints  on  the  partition. 

The  cases  B  <  S;C  =  1  and  B  =  1;  C  <  S  are  considered 
in  [2]  and  [3],  respectively.  S  is  the  number  of  states  in  the 
trellis.  The  case  B  <  S;C  =  1  is  optimal  with  respect  to 
complexity  [2].  Here  the  results  in  [2]  and  [3]  are  generalized 
to  the  case  B  =  2;  C  >  1.  This  method  can  be  generalized  to 
apply  for  an  arbitrary  B. 

II.  System  Description 

The  message  to  be  sent  over  an  ISI  AWGN  channel  is  a  se¬ 
quence  a.y  =  (oo,ai,. ..  ,ajv_i}  of  statistically  independent 
equally  probable  data  symbols  drawn  from  an  M- ary  alpha¬ 
bet.  The  MLSD  finds  the  candidate  sequence  having  the  min¬ 
imum  log  likelihood  metric  given  the  received  sequence.  This 
can  be  calculated  recursively  using  the  VA  or  the  SA(B,C). 
The  states  in  the  trellis,  which  the  detectors  operate  in,  are 
given  by  <rn  =  (a„_ j, . . . ,  an-L+i)  where  L  —  1  is  the  memory 
of  the  channel.  For  large  SNR  the  error  event  probability  of 
the  SA(JBjC')  detector  can  be  approximated  as  [2] 

Pr  (error)  «  KiQ  (y/d>minSNR)  +  K2Q  {y/df~SNR) 
where  d2min  is  the  minimum  vector  Euclidean  distance  and 

1This  work  was  supported  by  TFR  under  Grant  96-396. 


dm is  the  minimum  Euclidean  distance  for  the  VA,  K\ 
and  K-2  are  constants.  By  requiring  that  d'f  min  >  d2lin,  the 
SA (B,C)  detector  will  be  AO. 

III.  Partitioning 

To  find  a  constrained  partition,  consider  the  states  written 
on  the  form  <7„  =  (an-i,On-2,  •  •  ■  ,an-L+i)-  Next,  define  the 
partition  vector  [3]  T  =  (ri,r2,...,rn-i)  where  Tk,  1  <  k  < 
L  —  1,  denotes  a  partition  of  the  symbol  alphabet  for  the  fcth 
position  in  the  state  vector.  Let  7 k  be  the  number  of  subsets 
defined  by  partition  r*.  Each  subset  is  identified  by  a  label  in 
the  range  0, 1, . . . ,  7*  -  1.  No  connection  is  assumed  between 
the  partitions  Tk .  The  partition  vector  may  be  employed  to 
map  every  state  er„  into  a  corresponding  vector  of  subset  labels 
=  (A„i,A»2,...,A„l-i),  where  Xnk,  1  <  k  <  L  -  1,  is 
the  subset  label  of  an-k  in  the  partition  T*,.  A  state-class  is 
defined  as  the  set  of  states  that  map  onto  the  same  subset 
vector:  D(,)  =  {cr :  a  -¥  A(,)},  i  =  1,2, . . . ,  C. 

The  first  step  in  constructing  a  partition  is  to  obtain  some 
finite  set,  P,  of  alphabet  partitions  from  which  the  partition 
vector  is  to  be  constructed.  Finding  P  is  a  problem  that  only 
has  to  be  solved  once  for  each  alphabet.  This  can  be  done 
by  imposing  a  certain  degree  of  regularity  on  the  alphabet 
partitions,  by  considering  rotational  invariance  and  difference 
symbols  [3].  The  next  step  is  to  find  the  states  that  has  to 
be  partitioned  into  different  state  classes.  The  final  step  is 
to  perform  a  search  for  a  minimum  partition  vector  T  among 
the  set  of  vectors  (Ti,  F2, . . .  ,Tl-i)  with  elements  Tk  €  P 
for  the  given  constraints.  This  can  done  by  either  perform¬ 
ing  an  exhaustive  search  or  a  tree  search,  since  the  sequences 
of  ri,r2,...,rjT_i  can  be  represented  by  a  tree,  where  ev¬ 
ery  node  in  the  tree  represents  a  unique  partition,  since  e.g. 
(ri,r2)  =  (ri,r2,0).  The  complexity  BC  for  some  channels 
of  various  lengths  for  the  8PSK  alphabet  are  shown  in  table 
1,  using  the  VA  and  the  SA (B,C)  resulting  in  AO. 

Tab.  1:  The  complexity  for  partitions  resulting  in  AO  for  8PSK. 
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9 
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Abstract  —  We  propose  to  model  the  packets  ac¬ 
tivity  of  single  IP  address  by  the  Middleton  class  A 
noise  model.  Theoretical  results  and  numerical  simu¬ 
lations  indicate  that  the  class  A  noise  model  captures 
well  the  inter-arrival  properties  of  packets,  especially 
in  terms  of  long-range  dependence  (LRD),  which  is 
widely  observed  in  computer  network  traffic. 

In  this  paper  we  consider  the  micro-structure  of  Local  Area 
Network  (LAN)  traffic,  that  is,  the  single  user  network  traffic 
up  to  the  packet  level.  An  accurate  model  for  traffic  is  a 
valuable  tool  in  queuing  theory  studies. 

Recent  results  based  on  high-definition  network  traffic 
records  suggest  that  high-speed  data  networks  traffic  exhibits 
LRD.  Network  traffic  models  can  be  divided  into  two  cate¬ 
gories.  The  first  category  considers  the  macro-structure  of 
traffic.  In  this  class,  the  mathematical  models  are  fitted  to 
the  network  traffic  statistics,  without  considering  the  detail 
data  stream  structure.  Examples  include  the  fluid  flow  model, 
and  the  fractal  On/OfF  models.  In  the  second  category,  the 
network  traffic  is  viewed  as  a  point  processes  that  models  the 
data  stream  at  the  packet  level.  Various  Markov-modulated 
Poisson  processes  belong  to  this  category.  We  refer  this  kind  of 
modeling  as  micro-structure  modeling.  The  proposed  model 
belongs  to  the  second  class. 

Our  statistical  model  is  based  on  standard  renewal  pro¬ 
cesses  (SRP).  A  SRP  is  characterized  by  the  characteristics 
of  the  inter-events  distribution.  The  events  are  the  arriving 
packets  in  the  network  pipeline.  The  inter-event  times  are 
independent  random  variables  drawn  from  Middleton’s  class 
A  noise  envelope.  The  motivation  to  investigate  that  kind  of 
model  in  the  context  of  LAN  traffic  came  from  the  distinctive 
appearance  of  graphs  corresponding  to  traffic  data,  and  in  par¬ 
ticular,  time  intervals  between  packet  arrivals,  (see  Fig.l)  We 
were  able  to  show  analytically  that,  such  a  model  results  in  a 
process  with  LRD,  thus  capturing  the  essential  characteristic 
of  real  traffic.  In  the  sequel,  we  outlined  the  proof  of  the  main 
result  of  the  paper,  that  is,  a  standard  renewal  processes  with 
inter-event  times  modeled  as  class  A  noise  envelope  exhibits 
LRD.  By  LRD,  we  here  refer  to  the  definition  proposed  in  [1], 
according  to  which  the  power  spectrum  density  of  the  process, 
if  it  exists,  can  be  approximated  by  a  power-law  function. 

The  pdf  of  class  A  noise  envelop  equals 


wi(e)A  =  e  Aa 
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Here,  e,  e0  are  normalized  envelopes  The  parameters 
(Aa,  rV,  S}2a)  are  called  global  parameters.  In  practice,  they 
all  have  physical  meaning. 
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the  power  spectrum  of  the  SRP  becomes: 


Sn(oj)  =  fi2S(u)/ 27t)  + 


2v/2 Zi 

V*eAAuj(Z?  +  Z2) 


(3) 


where  fi  is  the  inverse  of  the  class  A  noise  envelop  mean  value. 

Although  the  power  spectrum  density  cannot  be  obtained 
in  closed  form,  it  can  be  numerically  established  that  for  small 
w’s  (around  unit  frequency)  the  power  spectrum  density  be¬ 
haves  like  a  power  law  function.  When  using  Middleton’s  class 
A  noise  to  model  inter-packet  times,  we  will  need  the  non- 
normalized  version  of  the  model.  The  above  presented  results 
can  be  easily  extended  to  the  scaled  version.  The  estimation 
of  class  A  model  parameters  from  real  data  is  discussed  in,  for 
example  [2]. 

Based  on  extensive  studies  involving  real  network  traffic 
data,  not  included  here  due  lack  of  space,  it  appears  that  the 
class  A  noise  model  can  capture  well  the  packet  level  activity 
of  a  single  user  in  a  local  area  high  speed  network.  Given 
the  adjustable  class  A  noise  parameters,  one  can  synthesize 
the  desired  network  traffic  load  traces,  which  have  the  same 
characteristics  as  real  world  traffic  data. 
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Figure  1:  Inter-packet  distribution  [prob(f  >  t0)]  obtained 
from  single  user  qin.ece.drexel.edu,  comparing  with  Middleton 
class  A  noise,  with  parameters  Aa  =  0.02  and  =  0.03.  X- 
axis  is  by  —0.5  log  (-  log  []),  Y-axis  is  by  10  log  Q,  i.e.  dB. 
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Abstract  —  The  broader  design  possibilities  for  sta¬ 
tistical  multiplexing  are  of  our  interest  provided  the 
ATM  streams  to  be  merged  are  not  just  independent 
but  also  many. 

I.  Introduction  and  Notations 

Statistical  multiplexing  of  n  independent  input  streams 
each  of  line  rate  r  and  mean  activity  m  is  considered,  each  be¬ 
ing  long  range  dependent  at  the  same  extent.  Slotwise  cyclic 
scanning  of  the  input  streams  is  assumed  at  a  rate  nr,  find 
a  single  common  output  stream  of  line  rate  R  —  Ir  <  nr. 
(A  :=  £  <  1.)  For  fairness  a  single  cyclic  shift  per  scanning 
cycle  of  the  stream  scanned  initially  is  assumed. 

Upper  and  lower  bounds  are  given  on  the  total  probability 
pr  of  overflow  (of  cells  from  any  stream,  Theorem  1,  [1])  for 
any  scanning  cycle  initiating  an  isolated  burst  of  overflowing 
cells  only.  (For  isolated  see  [1].)  The  impact  of  merging  many 
streams  is  investigated  for  multiplexing  by  scanning,  and  do¬ 
ing  nothing  else  (Version  («)),  and  for  multiplexing  as  usual, 
including  also  a  leaky  bucket  of  length  l  next  to  scanning  (Ver¬ 
sion  ( i )).  Instead  of  investigating  the  relation  between  the 
long  range  dependence  at  the  input  and  the  burst  length  at 
the  output,  a  broad  class  of  appropriate  Pareto-kind  template 
distributions  of  overflowing  cells  are  considered.  A  member  of 
this  class  is  chosen  in  a  best  way  (in  a  sense  defined  in  [1]) 
to  overbound  the  probability  estimate  that  the  burst  length 
of  lost  cells  is  exceeded.  (For  such  an  estimate  observations 
should  be  available  on  the  cells  stored  in  the  course  of  each 
scanning.) 

II.  Main  Results 

Given  pt  <  1,  a  pair  of  m  and  A  jointly  admissible,  denote 
by  no  (pr)  the  least  admissible  number  of  the  input  streams 
n,  from  which  upwards  the  total  probability  of  any  scanning 
cycle,  intitiating  an  isolated  burst  of  overflowing  cells,  does 
not  exceed  pr-  From  Theorem  1  ([1])  follows: 

niB{pr)  <  «o(pt)  <  uub(pt). 

Here  nuB(pr)  :=  [ul-  «  stands  for  a  positive  real,  being  the 
solution  of  the  following  equation: 

IgPT1  =  u  (D(^  I!  Q)  “  £s(«)  _  «ts(«))- 

'This  work  was  supported  in  part  by  the  European  Copernicus 
Project  No.  COP579  (1995-1999),  the  Hungarian  Telcomm.  Foun¬ 
dation,  Grant  No.  109  (1999),  and  a  research  professor  visit  of  the 
author,  at  the  CATSS,  UTDallas,  TX,  USA,  Feb/March  1999. 


The  the  relative  entropy  underlying  our  present  model  is  de¬ 
noted  by 

D(7>  ||  Q)  =  aA+(i_A)£— \ 

'  m  (1  —  m) 

V  :=  (A,  1  -  A),  and  Q  :=  (m,  1  -  m)  are  the  underlying 
binary  probability  distributions.  es(n),  tLB{n )  andeus(n)  are 
positive,  each  decreasing  with  increasing  n  and  approaching 
0  as  n  — 1  oo  (each  precisely  given  in  [1]).  lg  x  stands  for 
the  logarithm  of  x  >  0  of  base  10.  For  hlbCpt)  >  0  see  [1]. 
For  the  probability  p  per  input  stream,  corresponding  to  pr, 
the  following  equation  holds:  p  =  “Pt  (Proposition  2,  [1]). 
For  Version  ( i )  the  following  upper  bound  is  given  on  the  total 
probability  pr  of  ATM  cell  loss  due  to  leaky  bucket  saturation, 
provided  experience  on  the  cell  bursts  is  available  (Proposition 

4  [1]): 

Pt  <  Pt  Ptub(1)- 

Ptub(1)  stands  for  the  upper  bound  on  the  total  conditional 
probability  estimate  of  cell  loss,  given  event  0,  estimated  by 
a  member  of  the  Pareto-kind  distribution  class,  selected  ac¬ 
cording  to  Section  I.  ©  occurs  if  the  just  considered  scanning 
cycle  is  initiating  an  isolated  burst  of  overflowing  cells.  The 
probability  of  cell  loss  p  per  stream  is  related,  under  a  real¬ 
istic  assumption,  to  pr  as  p  to  pr  (Proposition  5,  [1]).  Let 
n  =  2"  (t/  =  1,2,.. .).  Denote  by  y  the  at  most  admitted 
per  stream  probability  and  by  7r  that  of  the  the  total  prob¬ 
ability  of  the  cell  loss.  Let  7  =  10  9,  and  7t  =  10  7  (as¬ 
suming  n  >  100).  Then  the  least  number  of  input  streams 
still  admitted  is  n  =  2*  >  nuB  for  pr  <  7T  =  10  7 ,  for  a 
design  with  scanning  only.  However,  even  n  =  27  might  be 
admissible,  even  with  pr  =  10-4  <  jt,  for  a  design  with  a 
leaky  bucket  next  to  scanning.  This  might  be  so  if  (i)  not 
only  the  estimate  on  the  conditional  probability  (given  0)  of 
the  cell  loss  due  to  bucket  saturation  can  be  overbounded  by 
Ptub(J)  <  10-3,  but  (ii)  the  upper  bound  can  also  be  tolerated 
on  the  conditional  probability  estimate  (given  0)  that  a  still 
tolerable  burst  length  of  cells,  lost  during  bucket  saturation,  is 
exceeded.  Single-  and  bi-variate  large  deviation  relations  are 
considered  for  finite  many  terms  in  Theorem  1  and  Proposi¬ 
tion  4.  Obviously  a  study  in  finite  terms  is  indispensible  for 
estimating  no(pr).  One  might  expect,  under  two  realistic  as- 
summptions,  Version  ( i )  to  offer,  even  for  n  >  uub{ 7t),  more 
room  for  bearing  long-range  dependence.  (For  background  ref¬ 
erences  and  acknowledgements,  and  for  notions,  assumptions, 
and  all  proofs  see  [1].) 

References 

[1]  S.  Cs.,  Prepint  under  the  same  title. 


105 


0-7803-5857-0/00/$  1  0.00  ©2000  IEEE. 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Delay  analysis  for  prioritized  service  of  variable  rate  regenerative 

traffic  sources 

Michael  Shalmon 


ABSTRACT  NOT  AVAILABLE  AT  THE  TIME  OF  PRINT 


106 


0-7803-5857-0/00/$  1  0.00  ©2000  IEEE. 


ISIT2000,  Sorrento,  Italy,  June  25-30,2000 


Scheduling  for  Fair  Allocation  of  Rates  in  Multirate  Multicast  Networks 

Saswati  Sarkar  and  Leandros  Tassiulas 

Dept,  of  Electrical  and  Computer  Engineering  and  Institute  for  Systems  Research 
University  of  Maryland,  College  Park,  MD,  USA 
email  addresses:  swati@eng.umd.edu,  leandros@isr.umd.edu 


I.  Extended  Summary 

We  study  fair  allocation  of  service  rates  for  real  time  loss 
tolerant  traffic  in  arbitrary  networks  with  multicast  ca¬ 
pabilities.  Multicasting  poses  some  specific  fairness  chal¬ 
lenges.  The  fairness  objective  is  that  every  receiver  re¬ 
ceives  service  at  a  rate  commensurate  with  its  capability 
and  the  capacity  of  the  path  from  the  source.  Hence, 
different  receivers  should  receive  information  at  different 
rates.  The  source  encodes  the  signal  into  several  layers 
that  can  be  incrementally  combined  to  provide  progressive 
refinement.  Every  receiver  must  receive  the  most  signif¬ 
icant  layer  (layer  1)  for  basic  information.  If  a  receiver 
has  additional  bandwidth,  it  can  subscribe  to  other  layers 
for  better  reception  quality.  A  layer  carries  meaningful  in¬ 
formation,  only  when  all  the  more  significant  layers  have 
been  successfully  decoded.  Thus  the  objective  is  to  pro¬ 
vide  fair  rates  of  service  to  the  receivers,  and  to  limit  the 
packet  losses  to  the  less  significant  layers. 

We  have  previously  proposed  distributed  algorithms  for 
computing  the  fair  rate  allocation]!].  Once  the  fair  rates 
are  known,  many  congestion  control  policies  such  as  fair 
queueing  can  be  used  to  attain  the  computed  rates.  How¬ 
ever,  rate  computation  requires  the  exact  knowledge  of 
system  parameters,  such  as  link  bandwidths.  In  general, 
the  schedulers  at  the  nodes  may  not  have  exact  knowl¬ 
edge  of  this  link  capacity.  It  is  also  necessary  to  exchange 
messages  between  neighboring  nodes.  This  increases  infor¬ 
mation  overhead.  We  propose  a  scheduling  policy  which 
attains  the  maxmin  fair  rates  without  computing  them  be¬ 
forehand.  In  addition  to  guaranteeing  fairness,  this  policy 
confines  packet  losses  to  less  significant  layers,  and  pro¬ 
tects  the  more  important  layers,  when  there  is  shortage  of 
bandwidth.  Furthermore,  this  policy  does  not  require  any 
knowledge  of  traffic  statistics,  is  computationally  simple, 
and  is  essentially  local  information  based. 

II,  Scheduling  Policy 

We  propose  a  scheduling  policy  based  on  prioritized 
round  robin  with  window  flow  control  for  multirate  multi¬ 
cast  networks.  Let  session  i  traverse  through  link  l.  Then, 
the  “logical  buffers”  (t)  denotes  the  number  of  layer 

k  packets  of  session  i  waiting  for  transmission  in  link  l  at 
time  t.  Logical  buffers  £(*,*•, j)(<)s  are  monitored  at  each 
node,  for  every  session  i  traversing  the  node  (Figure  1).  A 
window  parameter  (W)  is  associated  with  the  policy. 

All  sessions  traversing  a  link  are  sampled  in  round  robin 
order.  Consider  a  session  i  traversing  link  l.  When  session 
i  is  sampled,  it  first  tries  to  send  a  packet  of  the  most 
significant  layer  (layer  1).  If  it  does  not  succeed,  it  tries 


Fig.  1.  Each  session  transmits  two  layers  only.  We  show  the  log¬ 
ical  buffers  associated  with  source  and  destination  of  link  eg. 
For  example,  Bm  consists  of  session  1  layer  2  packets  waiting 
for  transmission  in  link  e 4.  Consider  the  scheduling  of  link  eg. 
Here,  Ti(eg)  =  {e^e®}  and  T2(eg)  =  {eg}.  Session  1  and  2  are 
sampled  in  round  robin  order.  When  session  1  is  sampled,  it 
sends  the  most  significant  layer  (layer  1)  packet,  if  Bi  13(f)  >  0, 
and  min(Bn4(t),  Bns(t))  <  W.  Otherwise,  it  tries  to  send  a 
layer  2  packet.  It  sends  a  layer  2  packet  if  Bi23(f)  >  0,  and 
min(Bi24(t),  Bi2s(f))  <  W.  If  it  can  not  send  a  layer  2  packet, 
it  passes  its  chance  to  session  2.  Now  session  2  tries  to  send  a 
lowest  layer  packet  first,  and  so  on.  Let  W  —  5,  Bii3(t)  =  2, 
Bi23(t)  =  1,  Bii4(t)  =  5,  Bi24(f)  =  2,  Bns(t)  =  7,  Bi2S(f)  =  6. 
In  this  case,  session  1  is  not  able  to  transmit  a  layer  1  packet. 
However,  session  1  transmits  a  layer  2  packet. 

to  send  the  second  most  significant  layer  packet  (layer  2), 
and  so  on.  If  all  layers  of  session  i  are  exhausted,  the 
scheduler  switches  to  the  next  session  in  the  round  robin 
order.  Let  Ti(l)  be  the  set  of  links  originating  from  the 
destination  of  link  l  that  lie  on  the  path  of  session  i.  A  Al¬ 
layer  packet  of  session  i  will  be  successfully  transmitted  at 
time  t  if 

1.  no  session  i  packet  from  layers  1, . . . ,  A:  —  1,  can  be 
transmitted, 

2.  a  A;— layer  session  i  packet  is  waiting  at  the  source 
node  of  the  link,  for  transmission  in  the  link 
(S(i,fci/)(t)  >  0)  and 

3.  at  least  one  of  the  logical  buffers  for  the  A:th  layer  of 
session  i  at  the  destination  node  of  the  link  has  less 
than  W  packets  (i.e.,  minj/€rj(j)  B(t, *,/')(£)  <  W). 

Refer  to  figure  1  for  an  illustrative  example.  We  have 
proved  that  for  all  sufficiently  large  window  and  physical 
buffer  sizes,  this  policy  allocates  the  maxmin  fair  rates  to 
all  receivers  of  all  sessions.  Note  that  congestion  related 
packet  loss  is  possible  at  any  node.  The  policy  offers  an 
inherent  priority  to  more  significant  layers  of  a  session  at 
every  node.  Thus  the  presence  of  less  significant  layers 
is  transparent  to  more  significant  layers.  In  fact  we  have 
shown  analytically  that  the  more  significant  layers  suffer 
negligible  packet  loss,  and  the  packet  losses  are  confined 
to  the  least  significant  layer  served. 
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Abstract  —  This  work  addresses  the  analysis  of  mo¬ 
tion  embedded  in  spatio-temporal  digital  signals  as 
well  as  motion  taking  place  in  the  outer  space  l3  xE 
Three  categories  of  motion  are  considered  and  re¬ 
ferred  to  as  translational,  rotational  or  deformational. 
In  each  category,  motion  parameters  are  defined  from 
all  the  temporal  derivatives  i.e.  position,  velocity  and 
accelerations.  Motion  analysis  means  not  only  de¬ 
tection,  estimation,  interpolation,  and  tracking  but 
also  motion-compensated  filtering,  signal  decomposi¬ 
tion,  and  selective  reconstruction.  In  this  context, 
we  show  how  all  motion  models  can  be  derived  from 
Lie  groups  and  how  group  representations  define  con¬ 
tinuous  wavelets  in  the  functional  space  of  the  sig¬ 
nals.  Motion  detection,  estimation  and  interpolation 
are  based  on  continuous  wavelet  transforms.  Selective 
motion  tracking  is  based  on  the  adjunction  of  a  vari¬ 
ational  principle  of  optimality.  The  optimality  princi¬ 
ple  defines  the  trajectory  or  the  geodesic  and  provides 
the  appropriate  PDE  of  wavelet  motion,  the  tracking 
equation  (ODE),  the  selective  constants  of  motion  to 
be  tracked,  and  all  the  symmetries  to  be  imposed  on 
the  system.  The  Green  functions  of  these  PDE’s  give 
rise  to  the  converse  operators  i.e.  wavelet  propaga¬ 
tors  and  kernels  of  integral  equations.  These  integral 
equations  have  several  applications:  (1).  put  a  still 
wavelet  on  a  trajectory  to  perform  velocity  or  motion- 
oriented  filtering,  (2).  achieve  the  motion  compensa¬ 
tion  of  a  signal.  This  work  investigates  in  fact  the  har¬ 
monic  analysis  associated  with  motion  groups  which 
leads  to  special  functions,  spectral  signatures,  propa¬ 
gators  and  yields  motion-based  detection  and  velocity 
or  motion-oriented  filtering.  This  motion  analysis  fits 
to  both  deterministic  and  stochastic  processes.  Even¬ 
tually,  spatio-temporal  discrete  wavelets  can  be  de¬ 
rived  from  their  continuous  cognates  as  the  orthonor¬ 
mal  bases  that  perform  signal  decompositions  along 
the  trajectory  and  achieve  selective  reconstructions 
of  moving  patterns  of  interest. 

I.  Motion  Models  and  Assumptions 

The  entire  construction  for  this  signal  analysis  lies  on  defin¬ 
ing  a  Lie  group  or  a  Lie  algebra  of  transformation  and  an 
Euler-Lagrange  equation.  In  short,  three  assumption  have 
to  be  given  as  a  law  of  composition  and  its  inverse  and 
a  principle  of  optimality  from  calculus  of  variations.  For 
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each  class  of  motion  transformations,  we  consider  the  pa¬ 
rameters  of  position,  velocity  and  accelerations.  For  trans¬ 
lational  motion,  the  spatial  position,  velocity  and  acceler¬ 
ations  are  considered  along  with  the  temporal  translation. 
For  (circular)  rotational  motion  in  two-dimensional  space, 
the  parameters  of  angular  position,  velocity  and  accelera¬ 
tions  will  be  denoted  in  the  order  of  the  Taylor  expansion 
6i  6  K;  i  €  Z+.  A  variant  of  circular  rotation  is  hyperbolic 
rotation  which  is  denoted  by  <f>i  instead  of  0, .  The  rotations  are 
expressed  through  unitary  matrices  of  transformation  namely 
cos(9ir')  —sin(dir') 
sin{0iTl)  cos(8iTr) 
invariance  for  circular  rotations.  For  deformational  transfor¬ 
mation,  the  parameters  of  velocity  and  accelerations  will  be 
taken  into  account.  The  zero-order  deformation  is  the  most 
important.  This  is  the  scale  which  provides  multiresolution 
analyses  on  space  and  time  respectively.  The  matrix  of  defor- 

“iT‘  0 
0  eaiT 

general  transformations  taking  place  in  Mn  on  homogeneous 
surfaces  (spheres,  hyperboloids)  or  on  smooth  manifolds,  the 
scale  parameter  becomes  a  matrix  A.  These  transformations 
rely  on  groups  within  GL(m,  E). 


R(9ir') 


1  blliUUgll  l 

(cot 
sir 


with  the  x\  +  £2- 


mation  is  defined  as  A  = 


In  case  of 
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Abstract  —  Accurate  channel-quality  estimates  are  needed 
for  a  variety  of  reasons  in  wireless  communication  systems,  such 
as  for  power  control  and  for  adaptive  transmission  and  routing. 
Channel-quality  information  can  be  derived  from  many  sources 
at  the  receiver,  including  statistical  characterizations  of  the  chan¬ 
nel,  information  from  the  demodulation  process,  and  information 
from  the  error-correcting  and  error-detecting  codes.  One  simple 
method  for  estimating  the  channel-quality  is  to  estimate  the  chan¬ 
nel  error  rate  by  re-encoding  the  outputs  of  the  error-correcting 
code  and  comparing  the  re-encoded  symbols  to  hard  decisions 
at  the  demodulator  output.  In  the  presentation,  we  will  present 
analysis  and  simulation  results  for  several  different  channel  qual¬ 
ity  estimates  derived  from  estimates  of  the  channel  error  rate. 

I.  Introduction 

In  this  paper  we  consider  channel  quality  estimates  based  on  es¬ 
timates  of  the  channel  error  rate.  An  estimate  of  the  channel  er¬ 
ror  rate  for  a  system  employing  error-correcting  codes  can  be  deter¬ 
mined  from  comparing  hard-decision  outputs  of  the  demodulator  to 
re-encoded  symbols  from  the  output  of  the  decoder  [1],[2].  We  con¬ 
sider  the  performance  of  such  estimates  for  convolutionally  encoded 
data  transmitted  with  binary  phase-shift-keying.  Consider  a  system 
that  uses  binary  transmission  over  an  additive  white  Gaussian  noise 
channel.  The  information  to  be  transmitted  is  convolutionally  encod¬ 
ed  and  transmitted  in  blocks  of  N  bits.  The  channel  causes  B  >  0 
channel  symbol  errors  to  occur,  as  measured  by  hard-decisions  at  the 
output  of  the  demodulator.  The  receiver  re-encodes  the  output  of  a 
Viterbi  decoder  and  compares  it  to  hard-decisions  at  the  output  of 
the  demodulator.  The  number  of  differences  between  these  encoded 
streams  is  labeled  B'  and  is  an  estimate  of  B,  and  thus  can  be  used 
to  estimate  the  channel  error  rate.  If  no  errors  occur  at  the  output 
of  a  Viterbi  decoder,  B'  =  B.  If  errors  do  occur  at  the  output  of  the 
decoder,  B'  ^  B,  and  B'  may  not  give  an  accurate  count  of  the  num¬ 
ber  of  channel  symbol  errors  that  occurred.  The  probability  of  a  block 
having  multiple  event  errors  is  much  higher  for  systems  that  employ 
adaptive  transmission  techniques  or  have  highly  dynamic  channels 
than  for  other  systems.  For  many  systems,  additional  information  is 
available  to  determine  whether  the  output  of  the  Viterbi  decoder  is  in 
error.  For  instance,  error-detecting  codes  are  often  necessary  to  val¬ 
idate  that  the  received  block  is  correct.  This  additional  information 
can  be  used  to  improve  the  accuracy  of  the  error  counts. 

The  number  of  bit  errors  that  occur  in  a  block  can  be  used  to  gener¬ 
ate  several  different  estimates  of  channel  quality,  including  estimates 
of  the  channel  error  rate,  estimates  of  the  signal:to-noise  ratio,  or 
other  estimates.  In  this  summary,  we  consider  estimates  for  the  bit 
energy-to-noise  density  ratio  based  on  error  counts  from  comparing 
the  re-encoded  outputs  of  a  Viterbi  decoder  to  hard-decision  outputs 
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of  the  demodulator.  The  estimates  also  employ  knowledge  from  an 
error-detecting  code  about  whether  the  block  was  successfully  de¬ 
coded.  The  estimates  of  the  bit  energy-to-noise  density  ratio  that  we 
consider  are  of  the  form  £  =  f(b),  where  b  denotes  the  counted 
number  of  differences  between  the  hard  decision  demodulator  out¬ 
puts  and  the  re-encoded  decoder  outputs  for  a  packet.  For  instance, 
for  the  maximum  a  posteriori  (MAP)  estimate, 

£map  =  argmax  P(s  =  e\  B1  =  b). 

e 

For  the  minimum  mean-square  error  (MMSE)  estimate, 

Smmse  =  E  {e  \B'  =  b}  . 

We  use  several  analytical  techniques  to  determine  approximations  for 
these  estimates.  For  instance,  it  is  intractable  to  consider  all  of  the 
possible  multiple-event  errors,  so  we  use  the  approach  taken  in  the 
analysis  of  turbo  codes  and  determine  a  weight  profile  for  the  equiv¬ 
alent  block  code  [3].  Our  approach  also  requires  the  the  probability 
that  an  event  error  occurs  given  that  a  certain  number  of  the  deci¬ 
sion  statistics  are  in  error.  This  involves  calculating  the  probability 
that  a  sum  of  non-Gaussian  decision  statistics  is  less  than  zero.  Our 
approach  is  to  use  a  Gaussian  approximation  for  the  sum,  and  simu¬ 
lations  show  that  this  approximation  provides  sufficient  accuracy  for 
many  cases. 

II.  Conclusions 

In  the  presentation,  we  will  present  results  for  channel  quality 
estimates  derived  from  error  counts  from  re-encoding  the  output  of 
an  error-correcting  decoder.  We  consider  estimates  that  also  employ 
information  from  an  error-detecting  code  that  provides  information 
about  whether  the  packet  decoded  correctly.  A  framework  for  ana¬ 
lyzing  the  accuracy  of  such  estimators  for  convolutionally  encoded 
data  is  derived,  and  results  are  presented  that  illustrate  the  accuracy 
of  several  different  types  of  estimators  for  different  channels.  Our 
results  indicate  that  for  the  minimum  mean-square  estimate  of  the 
signal-to-noise  ratio,  channel  error  counts  of  the  type  discussed  in 
this  paper  yield  mean  squared  errors  in  the  0.5  dB  to  2.5  dB  range 
depending  on  the  range  and  distribution  of  the  actual  signal-to-noise 
ratios.  We  will  also  illustrate  the  performance  of  these  estimates  for 
some  example  adaptive  signaling  schemes. 
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Abstract  —  In  this  paper,  we  consider  a  kind  of 
discrete-time  linear  systems  with  uncertain  observa¬ 
tions,  in  which  the  additive  noises  of  the  state  and 
observation  equations  are  correlated  with  each  other. 
By  using  the  Orthogonal  Projection  Theorem,  a  re¬ 
cursive  algorithm  to  obtain  the  least  mean-squared 
error  polynomial  estimator  for  the  state  of  these  sys¬ 
tems  is  proposed. 

I.  Introduction 

In  the  estimation  theory  developed  by  Kalman,  it  is  assu¬ 
med  that,  at  any  time,  the  signal  to  be  estimated  is  contained 
in  the  observations.  However,  in  many  practical  situations, 
such  as  communication  systems,  there  may  be  a  nonzero  pro¬ 
bability  (false  alarm  probability)  that  any  observation  consists 
of  noise  alone;  this  may  be  caused  by  an  intermittent  failure 
in  the  observation  mechanisms. 

These  situations  are  described  by  a  system  whose  obser¬ 
vation  equation  includes  not  only  an  additive  noise,  but  also 
a  multiplicative  noise  component,  modelled  by  a  sequence  of 
Bernoulli  random  variables.  These  systems  have  been  investi¬ 
gated  under  the  topic  of  Systems  with  Uncertain  Observations. 
In  these  systems,  even  if  the  noises  are  gaussian,  the  condi¬ 
tional  expectation  is  not  a  linear  function  of  the  observations 
and  it  requires  an  exponentially  growing  memory  for  its  com¬ 
putation  (Jaffer  and  Gupta  [2]).  Consequently,  for  this  class 
of  systems,  attention  has  been  directed  to  suboptimal  estima¬ 
tors. 

The  linear  estimation  problem  in  systems  with  uncertain 
observations,  when  the  interruption  process  is  a  binary  inde¬ 
pendent  sequence,  was  treated  by  Nahi  [4].  Later  on,  Hermoso 
and  Linares  [3]  extended  the  results  of  Nahi  for  the  case  when 
the  state  and  measurement  noises  are  correlated  at  consecu¬ 
tive  instants  of  time. 

More  recently,  Garcfa-Ligero  et  al.  [1]  have  studied  the 
quadratic  estimation  problem  in  systems  with  uncertain  ob¬ 
servations  under  the  hypothesis  of  mutual  independence  of  the 
noise  and  the  initial  state. 

In  this  paper  we  consider  systems  with  uncertain  observa¬ 
tions  when  the  additive  noises  of  the  state  and  the  observation 
are  correlated  at  the  same  instant  of  time.  At  an  earlier  stage, 
we  proposed  to  approach  the  linear  estimation  problem  in 
these  systems,  which  still  had  not  been  studied,  to  subsequent¬ 
ly  obtain  estimators  which  improved  the  linear  one.  Finally, 
we  have  approached  the  least  mean-squared  error  polynomial 
estimation  problem  in  these  systems  as  a  whole. 

1This  work  has  been  supported  by  the  “Comisidn  Interministe- 
rial  de  Ciencia  y  Tecnologfa”  under  contract  PB98-1286. 


This  study  generalizes  the  work  of  Garcfa-Ligero  et  al.  [1]  in 
two  directions:  on  the  one  hand,  the  independence  hypothesis 
of  the  noises  is  weakened  and,  on  the  other  hand,  polynomial 
estimators  of  an  arbitrary  order  v  (y  >  1)  are  considered. 

II.  Polynomial  Estimation  Problem 
In  order  to  approach  the  aforementioned  optimal  i/th-order 
polynomial  estimation  problem,  we  define  a  new  system  (aug¬ 
mented  system),  whose  state  and  observation  vectors  are  ob¬ 
tained  as  the  aggregate  of  the  original  vectors  and  their  Kro- 
necker  powers  up  to  the  i/th-order.  Thus  the  least  mean- 
squared  error  linear  estimator  of  the  augmented  state  based  on 
the  augmented  observations  provides  the  optimal  polynomial 
estimator  for  the  state  of  the  original  system. 

Then  the  problem  is  reduced  to  obtain  the  least  mean- 
squared  error  linear  estimator  for  the  state  of  the  augmented 
system.  This  system  has  uncertain  observations,  and  the  state 
and  observation  noises  are  correlated  with  each  other.  Hence, 
the  recursive  algorithms  proposed  by  Nahi  [4]  and  Hermoso 
and  Linares  [3]  cannot  be  applied  since  the  augmented  system 
does  not  satisfy  the  required  conditions  for  their  application. 

By  using  the  Orthogonal  Projection  Theorem,  a  reclusive 
algorithm  to  obtain  the  optimal  linear  estimator  for  the  state 
of  the  augmented  system  is  proposed.  This  algorithm,  that 
generalizes  the  Nahi  algorithm,  sets  up  reclusive  equations 
which  allow  us  to  obtain  the  linear  filter  as  a  function  of  the 
linear  predictor  and  reciprocally.  It  should  also  be  noted  that 
the  computation  of  the  error  covariance  matrices  is  indepen¬ 
dent  of  the  estimators  computation.  This  allows  us  to  quantify 
the  goodness  of  the  estimation  without  having  to  calculate  the 
estimators  explicitly. 

Finally,  as  we  have  indicated  above,  the  optimal  polynomi¬ 
al  estimator  for  the  state  of  the  original  system  is  obtained 
from  optimal  linear  estimator  for  the  state  of  the  augmented 
system. 
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Abstract  —  We  study  the  asymptotic  behavior  of 
the  Bayesian  estimator  of  the  parameters  of  a  Hid¬ 
den  Markov  Model  (HMM)  with  continuous  or  finite 
observations  and  finite  state  space. 

I.  Introduction 

Maximum  Likelihood  (ML)  is  still  the  most  popular  ap¬ 
proach  to  parameter  estimation  for  HMM’s.  The  established 
technique  for  the  computation  of  the  ML  is  the  use  of  an  algo¬ 
rithm  of  the  EM  type,  which  has  poor  convergence  properties 
and  is  computationally  expensive.  In  this  paper  we  consider 
the  optimal  mean  square  (Bayesian)  estimator  as  a  possible 
alternative  to  ML.  We  study  the  asymptotic  properties  of  the 
Bayesian  estimator  and  prove  its  consistency  under  standard 
hypotheses  of  identifiability  of  the  model  class  and  of  positivi¬ 
ty  of  the  prior  probability.  We  briefly  consider  the  algorithmic 
aspects  and  provide  an  explicit  formula  for  the  case  of  finitely 
valued  observations.  Whether  Bayesian  estimators  constitute, 
from  the  computational  point  of  view,  a  viable  alternative  to 
ML  is  a  question  that  still  has  to  be  settled. 

II.  Statistical  Model 

Let  {Xn  ,  n  >  0}  and  {Yn ,  n  >  0}  be  two  sequences,  de¬ 
fined  on  a  probability  space  (Cl,  T,  P),  with  values  in  the  finite 
set  5  =  {1,  ■  •  • ,  N}  and  Rd  respectively.  The  statistical  model 
is  a  parametric  class  of  HMM’s  defined  as  follows.  On  the 
space  (0,  T)  we  consider  a  family  (Pe  ,  9  €  0)  of  probabil¬ 
ity  measures,  with  ©  compact  subset  of  Rp,  such  that  under 
Pe  the  unobserved  (hidden)  state  sequence  Xn  is  a  Markov 
chain  with  transition  probability  matrix  (t.p.m.)  Qs  =  (qBlt] ) , 
i.e.  qlj  =  P9[ln+i  —  j  \  Xn  -  i]  ,  and  initial  probability 


The  proof  is  an  application  of  the  Laplace  expansion  tech¬ 
nique,  and  requires  the  development  of  a  uniform  version  of 
the  Shannon-McMillan-Breiman  Theorem  for  HMM’s,  which 
is  of  independent  interest.  Heuristically  one  can  observe  that, 
for  any  s  >  0,  asymptotically  the  estimator  6n  is  well  approx¬ 
imated  by 

f  9  exp  n(£(9)  +  e)v{9)d.9 
f  exp  n(i(9)  +  e)v(9)d9 

The  limit  for  n  ->  oo  can  be  identified  using  the  Laplace 
asymptotic  expansion  of  the  integrals.  The  assumption  that 
i(9)  has  a  unique  maximum  at  a  (identifiability  assumption), 

allows  us  to  conclude  that  9n  is  consistent.  The  technical 
results  on  which  the  proof  is  based  can  be  found  in  [1],  [2],  [3]. 

IV.  Explicit  Form  of  the  Estimator 

A  more  explicit  expression  of  the  Bayesian  estimator  can  be 
given,  properly  choosing  the  prior  density  t'(-).  In  the  special 
case  of  finitely  valued  observations  and  parameter  9  coincid¬ 
ing  with  the  t.p.m.  Qe  of  Xn,  one  can  adopt  the  Dirichlet 

prior  [5],  vd(9)  =  Jl*  IT,  where  r(0  denotes 

the  Gamma  function.  A  long  but  straightforward  algebraic 
manipulation  gives 

Lemma  The  estimator  9n  corresponding  to  wd{9)  is  given, 
componentwise,  by 


E[qij\YT} 


Ng(X?)  +  1/2 

Nt(X?)  +  fc/2 


distribution  7To  =  (fro.)  independent  of  9  €  ©,  and  possibly 
different  from  the  true  probability  distribution  n  of  Xq.  The 
observations  Yn  are  mutually  independent  given  the  sequence 
of  states,  i.e.  P9[Vn  €  dyn--Yo  €  dyo  \  Xo  =  io,..Xn  =  in]  = 
Ilfc=o  G  dyk  |  Xk  ~  ik\-  We  assume,  moreover,  that  the 

model  set  contains  P,  the  true  measure,  i.e.  that  there  exists 
a£0  such  that  P  =  PQ . 

III.  Bayesian  Estimation 

In  the  Bayesian  approach  to  estimation  a  prior  distribution, 
with  density  say  v(-),  is  assigned  on  the  parameter  space  0. 

The  Bayesian  (optimal  m.s.e.)  estimator  is  given  by  9n  = 
E[9\Yn]  where  the  expectation  is  computed  with  respect  to 
the  posterior  density  p{9\YTL).  Our  main  theorem  generalizes 
to  HMM’s  a  classical  result  on  the  asymptotic  behavior  of  the 
posterior  density  [4].  Let  us  define  l{9)  =  lim  \ogpe{yo). 
Theorem  //£(•)  has  a  unique  maximum  at  a,  and  if  the  prior 
density  u(-)  >  0  everywhere  then 


,  and  initial  probability  where  Q(Xi,Y{1)  is  the  a-posteriori  measure,  i.e . 


fr,yr)  =  J 


Pt(X?,Y?)vD(0)dO. 


and  Nij(Xi)  denotes  the  number  of  transitions  i  —*  j  in  X". 
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Abstract  —  In  this  paper  Sudan’s  algorithm  is 
modified  into  an  efficient  method  to  list-decode  a 
class  of  codes  which  can  be  seen  as  a  generalization 
of  Reed-Solomon  codes.  The  algorithm  is  special¬ 
ized  into  a  very  efficient  method  for  unique  decod¬ 
ing.  The  code  construction  can  be  generalized  based 
on  algebraic-geometry  codes  and  the  decoding  algo¬ 
rithms  are  generalized  accordingly.  Comparisons  with 
Reed-Solomon  and  Hermitian  codes  are  made. 

I.  Introduction 

The  minimum  distance  is  not  the  only  measure  of  the  usability 
of  a  code.  For  practical  purposes  it  is  important  that  there 
exist  an  efficient  decoding  method  to  make  use  of  the  error- 
correcting  capability,  and  it  is  important  that  error-patterns 
which  are  likely  to  occur  in  the  actual  application  are  usually 
corrected  by  the  decoder. 

In  [1]  a  series  of  new  distance  functions  on  vectors  over  fi¬ 
nite  sets  is  introduced  and  some  codes  which  are  good  with 
respect  to  this  distance  are  constructed.  However,  decoding 
methods  are  not  discussed.  This  paper  provides  efficient  meth¬ 
ods  for  unique  decoding  and  for  list-decoding  of  the  codes  pre¬ 
sented  in  [1]  which  are  based  on  Reed-Solomon  and  algebraic- 
geometry  codes.  The  methods  are  based  on  Sudan’s  improved 
algorithm  (see  [2]). 


II.  The  codes 

Let  F,  denote  a  finite  field  with  q  elements  and  suppose  that 

P  ■=  {-Pi,  ■  ■  •  ,  Pn}  C  F,  with  |P|  =  n  (1) 

Consider  a  polynomial,  /  €  F,  [x].  Given  some  Pi  €  P  we  can 
write 


deg (/) 

f  =  h’i(x  -  Pi  Y  with  fjd  €  F, 
j=o 

Definition  1  Let  r  be  a  positive  integer  and  let  0  <  k  <  rn. 
Then  define  the  following  error- correcting  code: 

C{P,r,k)  =  {f(P,r)  |  deg(/)  <  k} 

with  P  being  as  in  (1)  and 

f(P,r)  :=  {f o,i,  ■  ■  ■  ,  fr- i,i ;  /o,2,  •••  ,  /r-1,2;  •  •  •  ; 

/o,n)  •  •  •  >  fr- l,n) 

III.  The  distance 

In  C (P,  r,  k)  codewords  consists  of  n  chunks  of  r  field  ele¬ 
ments  where  each  chunk  corresponds  to  an  element  in  P.  This 
structure  is  reflected  in  the  following  definition  of  r-distance: 


Definition  2  Let  r  be  a  positive  integer  and  let  u,  v  €  F^n 
with  u  =  (u0, . . .  ,  Urn-i )  and  v  =  (v0, . . .  ,vrn-i)-  For  i  € 
{0,...  ,n—  1}  define  the  r-distance,  dr{u,v,i),  between  u  and 
v  with  respect  to  the  i  ’th  chunk  as  follows: 

dr(u,  v,  i )  r  —  min{j  >  0  |  j  =  r  V  uir+j  /  vir+j} 

Furthermore,  define  the  r-distance,  dr(u,v),  between  u  and  v: 

•  n—  1 

dr{u,v)  :=  ^2  dr(u,v,i) 
i=0 

The  following  theorem  (a  special  case  of  [1],  Theorem  6]) 
gives  the  main  parameters  of  the  code  C(P,r,k ): 

Theorem  3  C(P,r,k)  is  a  linear  code  of  length  rn  and  di¬ 
mension  k.  Furthermore,  the  minimum  r-distance  between 
two  different  codewords  in  C(P,  r,  k)  is  rn  —  k  +  1. 

IV.  Decoding 

In  the  paper  Sudan’s  improved  algorithm  is  modified  to  de¬ 
code  the  code  C(P,r,  k)  beyond  half  the  minimum  r-distance. 
The  following  theorem  holds: 

Theorem  4  Let  s  >  1  be  a  given  parameter  and  let  b3  satisfy 

( ^3\  <  rw(J21)  ^  fbs  +  1 

\2J~  k  -  1  2 

Then  the  algorithm  finds  a  list  of  all  codewords  within  r- 
distance  ts  from  the  received  word,  where  ts  =  rn  —  |/3/sJ  —  1 
withis  :=  L(r7is(s  — !)  +  (&— 1)6S(6S  —  l))/(2&,s)J .  Furthermore, 
the  number  of  codewords  on  the  list  is  at  most  bs  —  1. 

For  sufficiently  large  s  it  can  be  seen  that  rs  m  rn(  1  — 
y/k/(rn)). 

V.  Conclusion 

In  [4]  the  code  construction  and  decoding  algorithm  are 
generalized  to  a  setting  resembling  algebraic  geometry  one- 
point  codes.  The  generalization  make  use  of  so-called  increas¬ 
ing  zero  bases  (see  [3],  Theorem  1)  giving  a  code  construction 
slightly  different  from  [1], 
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Abstract  —  In  this  paper  we  give  a  construction  of 
all  binary  duadic  codes  of  length  n  =  pimip2m2  ■  •  ■prmr, 
by  which  all  binary  duadic  codes  of  given  length  can 
be  enumerated. 

I.  Introduction 

Quadratic  residue  (Q.R.)  codes  are  error-correcting  codes 
with  good  performance  [5],  Leon  et  al.  [3]  introduced  a 
new  family  of  binary  cyclic  codes,  called  duadic  codes,  which 
not  only  include  Q.R.  codes  as  subsets,  but  also  have  anal¬ 
ogous  properties  to  that  of  Q.R.  codes.  Leon  et  al.  [3]  also 
proved  that  binary  duadic  codes  of  length  n  exist  if  and  only 
if  n  =  J^pW,  where  each  p;  =  ±l(mod  8)  (see  [3,  4]).  Ding 

t 

et  al.  [1]  constructed  and  enumerated  all  of  the  binary  duadic 
codes  of  prime  length  by  presenting  a  cyclotomy  to  a  prime 
p  =  ±l(mod  8),  and  Ding  [2]  gave  the  construction  and  enu¬ 
meration  of  all  binary  duadic  codes  of  length  pm.  However, 
the  problem  of  constructing  and  enumerating  the  duadic  codes 
with  length  n  =  pirnip2m'2  •  •  ■prm'r  remains  open  up  to  now. 
In  this  paper,  we  will  completely  solve  this  problem. 

II.  Main  result 

In  the  sequel,  we  use  S„(a)  to  denote  the  multiplicative 
order  of  a  modulo  n. 

We  present  a  cyclotomic  approach  to  the  construction  of 
all  binary  duadic  codes  of  length  pm,  by  which  the  number  of 
all  binary  duadic  codes  of  length  pm  is  given. 

Result  1  Let  p  =  il(mod  8)  be  a  prime,  and  p  £  Pei  =  {p  : 
(p  —  1)/<5P(2)  =  2ei},  ej  =  2seo,  eo  is  odd,  and  let  m  be  a 
positive  integer.  Then  the  number  of  splittings  of  pm  is 

N(pm)  =  fl  2_1(22'eo)e/ei, 

J=0 

m 

where  &k  =  <f>(pk)/2Spk(2),  e  —  ^  e*,.  Thus  the  number  of 

fc=i 

duadic  codes  of  length  pm  is  4Ar(pr"). 

Furthermore,  we  give  a  construction  of  all  binary  duadic 
codes  of  length  n  =  pimip2m2  ■  ■  •prmr,  by  which  all  binary 
duadic  codes  of  given  length  can  be  enumerated. 

Let  2 T(l)  denote  the  number  of  non-zero  2-cyc!otomic 
cosets  of  pi  mip2m2  ■  ■  ■pirn‘ ,  and  2f;  denote  the  number  of  non¬ 
zero  2-cyclotomic  cosets  of  pimi . 

1This  work  was  supported  by  National  Natural  Sicence  Foun¬ 
dation  of  China(No.  69802002,  69882002  and  69772035),  and  by 
National  “863”  (No.  863-306-ZT05-05-2). 


For  any  l  >1,  if  gcd(<5Pj  (2),  5Pj  (2))  =  1  for  i  jt  j,  i,j  £ 
{1,  2,  •  ■  • ,  l},  then 

T(l)  =  YU'  +2  Y  U'U*  +22  Y  thti2U3 

*1  *1<*2  *1<»2<*3 

+  •  •  •  +  2*  1  ^  1  ^2  *  *  *  tl 

~  5Z  (  ^  ■  ■  ■  ijjt  j  (1) 

fc  =  l  \  *1<*2<-<U  / 

where  ij  £  {1,  2,  ■  •  • ,  1}  for  j  =  1, 2,  •  •  • ,  l. 

Result  2  Let  n  =  pimip2m2  •  •  •Prm”,  Pi  =  ±l(mod  8)  a 
prime  for  i  =  1, 2,  •  •  •  ,r,  N(x)  denote  the  number  of  the  split¬ 
tings  of  x,  2 ti  denote  the  number  of  non-zero  2-cyclotomic 
cosets  of  pim‘,  2 T(l)  denote  the  number  of  non-zero  2- 

r—  1 

cyclotomic  cosets  of  pimi p2m2  ■  •  ■  pimi ,  T  =  ^  ( T(l)ti+i  —  1). 

(=i 

Then 

N(n)  >  (2 T)2  •  8r_l  •  N(pimi)  ■  Y(p2m2)  ■  ■  •  N(prmr ) 

and  the  number  of  duadic  codes  of  length  n  is  at  least  4N(n), 
where  the  equality  is  achieved  if  gcd(dPi(2),5Pj.  (2))  =  1  for 
i  j,  in  this  case  T{1)  can  be  obtained  from  Equation  (1) 
for  l  =  1, 2,  •  •  ■ ,  r  —  1.  In  fact,  N(pimi )  can  be  obtained  from 
Result  1. 
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Abstract  —  The  structure  of  cocyclic  Hadamard  ma¬ 
trices  allows  us  a  much  faster  and  more  systematic 
search  for  binary,  self-dual  codes  .The  search  for  bi¬ 
nary  self-dual  [40,20]  codes  from  Z8  x  -  cocyclic 
Hadamard  matrices  and  two  types  of  D20  -  cocyclic 
Hadamard  matrices  resulted  in  25  equivalence  classes 
of  extremal  doubly-even  codes.  It  is  worth  noting 
that  the  equivalence  classes  found  in  each  case  were 
disjoint,  emphasising  the  importance  of  the  cocyclic 
structure  of  the  Hadamard  matrices  used. 


I.  Introduction 

Given  a  Hadamard  matrix  H  of  order  n  =  8s  +  4,  if  the 
number  of  +l’s  in  each  row  and  column  of  H  is  =  3  (mod  4) 
then  the  matrix  [J,  H]  generates  a  binary,  doubly-even,  self¬ 
dual  [2n,  n]  code  C,  where  H  =  (H  +  J)/ 2,  I  is  the  identity 
matrix,  J  is  the  all  l’s  matrix  of  order  n  (see  [4]). 

If  in  addition  H  is  of  the  shape 

’  -1  1...1  ' 

:  , 

1 

then  H'  is  an  (+1,  — l)-incidence  matrix  of  a  symmetric 
Hadamard  2— (n— 1,  n/2,  n/4)  design,  thus  satisfying  for  n  >  4 
the  condition  required  to  produce  doubly-even  codes. 


II.  Self-dual  codes  from  Z6  x  h\  cocyclic 
Hadamard  matrices 

Here  we  consider  H  to  be  a  Ze  x  Z]  -  cocyclic  Hadamard 
matrix. 

From  [2]  the  structure  of  a  Z*  x  Z2  -  cocyclic  matrix,  t  odd, 
is  to  a  t  x  t  block-backcirculant  matrix  W  with  top  row 
Wi,W2,  ■■■ ,  Wt ,  where 


m 

Xi 

Vi 

Zi 

Wi  = 

Xi 

Am 

Zi 

Ayi 

Vi 

Kzi 

Bn  i 

BKxi 

.  z* 

AKVi 

Bxi 

ABKm 

[1]  gives  the  conditions  which  make  the  search  for  self-dual 
codes  more  efficient  that  any  known  searches.  [1]  also  includes 
a  list  of  codes  obtained  from  a  preliminary  search  from  these 
Hadamard  matrices. 

The  Hadamard  matrices  were  also  converted  into  the  equiv¬ 
alent  (1)  form  and  used  to  produce  more  codes  from  Z5  x  h\ 
-  cocyclic  Hadamard  matrices.  Two  equivalence  classes  of  ex¬ 
tremal  doubly-even  Z8  x  Z*-  cocyclic[40,20]  codes  were  found, 
one  using  the  1  form. 


III.  Self-dual  codes  from  D4t 


In  [3]  Flannery  details  the  conditions  for  the  existence  of 
a  Hadamard  matrix  cocyclic  over  Dm,  the  dihedral  group  of 
order  4 t,t  >  1,  given  by  the  presentation 

<  a,6|a2<  =  ft2  =  (aft)2  =  1  > 


Cocyclic  Hadamard  matrices  developed  over  Dm  exist  only 
in  the  case  ( A,B,K )  =  (1, -1, 1),  (1, -1, -1),  (-1, 1, 1)  for  t 
odd,  where  A  and  B  are  the  inflation  variables  and  K  is  the 
transgression  variable.  The  matrices  for  (A,  B,  K)  =  (1,  -1, 1) 
and  (1,  -1,  -1)  possess  the  most  tractable  block  structure  and 
are  the  only  cases  dealt  with  here. 

In  the  case  ( A,B,K )  =  (1,  — 1, 1),  if  there  is  a  cocyclic 
Hadamard  matrix  associated  with  a  cocycle  in  this  class,  then 
t  is  the  sum  of  two  squares.  The  cocyclic  Hadamard  mar 


trices  have  the  form 


(  M  N  \ 
{  NCu  -MC2t  ) 


where  the  matri- 


M 

%  NC2t 

ces  M  and  N  are  2 1  x  2<  back  circulant  matrices  and  C2t  is 
the  back  circulant  2 1  x  2f  permutation  matrix  with  first  row 


1  0  0  0...  0.. 


The  case  ( A,B,K )  =  (1,— 1,  — 1)  is  very  prolific,  giving 
more  Hadamard  matrices  cocyclic  over  Dm  for  t  odd,  than 
the  case  above.  The  cocyclic  Hadamard  matrices  are  of  the 


M 

ND 


N  \ 

1  ,  where  the  2t  x  2 1  matrices  M  and  N 


are  each  the  entrywise  product  of  a  back  circulant  and  back 
negacyclic  matrix  (hence  are  symmetric),  and  D  is  the  2 1  x  2 1 
matrix  obtained  by  negating  every  noninitial  row  of  Cit- 

Generating  matrices  using  both  the  cocyclic  forms  and  the 
(1)  form  were  used.  Again  it  was  seen  that  other  equivalence 
classes  were  obtained  by  using  the  (1)  form.  In  general  it  was 
found  that  there  were  more  equivalence  classes  obtained  from 
the  Dihedral  construction  than  the  Z8  x  Z %  construction. 

In  the  case  ( A,B,K )  =  (1,— 1,1)  there  were  two  equiva¬ 
lence  classes,  one  obtained  by  using  the  (1)  form,  with  6400 
codes  in  each  class. 

In  the  case  ( A,B,K )  =  (1,— 1,— 1)  a  total  of  5621  extremal 
codes  were  found  divided  into  21  equivalence  classes.  Usage  of 
the  (1)  form  resulted  in  11200  codes  in  another  two  equivalence 
classes. 
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Abstract  —  We  consider  messages  represented  as  matri¬ 
ces.  The  term  rank  norm  of  a  matrix  is  defined  as  minimal 
number  of  lines  (rows  and  coulmns)  which  cover  all  the  non 
zero  entries  of  a  matrix.  We  propose  a  family  of  codes  cor¬ 
recting  term  rank  errors.  These  codes  are  optimal  since 
they  reach  the  Singleton- type  bound. 

I.  Introduction 

In  digital  communication,  messages  are  often  repre¬ 
sented  as  matrices.  For  example,  in  FDMA  or  OFDM 
systems,  information  is  transmitted  through  a  system  of 
parallel  channels.  A  message  can  be  considered  an  N  x  n 
matrix  where  N  is  the  number  of  channels  and  n  is  the 
duration  of  transmission  in  number  of  symbols. 

The  model  in  which  the  most  probable  event  is  a  cor¬ 
ruption  of  a  row  or  a  column  is  considered.  A  formal 
description  of  such  errors  is  given. 

For  simplicity  we  restrict  our  consideration  to  the  bi¬ 
nary  case.  Let  X  be  an  N  x  n  binary  matrix.  Let  w(X) 
be  the  minimal  number  of  lines  (rows  or  columns)  which 
cover  all  the  non  zero  entries  of  the  matrix.  The  number 
w(X)  is  known  as  the  term  rank  of  the  matrix  X.  This 
notation  is  introduced  and  used  in  combinatorial  matrix 
theory  [3].  The  term  rank  function  w(X)  on  the  set  of  all 
matrices  of  the  given  size  is  in  fact  the  norm  function. 

The  concept  of  term  rank  distance  for  coding  theory 
was  introduced  in  1971  (see,  [1] [2] .  The  term  ’’lattice- 
pattern  errors”  was  used  that  here  instead  of  ”  term  rank 
errors”.) 

The  maximal  norm  is  wmax  =  min  {AT,  n} .  The  term 
rank  distance  between  X  and  Y  is  defined  as  d(X,  Y )  = 
w(X  —  Y).  Let  C  be  a  code,  i.e. ,  any  set  of  matrices  of 
given  size.  The  term  rank  distance  of  a  code  is  defined  as 

d  =  d(C )  :=  min  {w(Mi  -  M,)|M;  G  C,  M,  G  C } 

MiA=Mj 

A  code  with  term  rank  distance  d  can  correct  up  to 
(d  —  l)/2  term  rank  errors. 

Let  M  =  \C\  be  the  cardinality  of  the  code  C.  The 
rate  of  the  code  is  defined  a s  R  :=  lo^nM . 

The  next  Lemma  gives  the  Singleton-type  upper 
bound. 

Lemma  1:  Let  C  be  a  matrix  code  of  size  N  x  n,  rate 
R,  and  term  rank  distance  d.  Then 

fl<l~  —  =  1~— (/,~J  T-  (1) 

wmax  min  {N,  n} 


A  code  C  is  said  to  be  the  Maximal  Term  Rank  Dis¬ 
tance  (MTRD)  code  if  it  satisfies  the  equation  (1)  with 
the  equality  sign. 

It  can  be  easily  shown  that  codes  for  rectangular  ma¬ 
trices  can  be  derived  from  codes  for  square  matrices. 

II.  Construction  of  Codes 

Note:  Rank  codes  proposed  in  [4]  can  correct  also  term 
rank  errors.  We  consider  more  general  codes  which  are 
not  rank  codes. 

A.  Known  Codes  of  Term  Rank  Distance  n  and  2. 

The  codes  descirbed  here  are  proposed  in  [1]  &  [2]. 

B.  New  Optimal  Codes  of  Term  Rank  Distance  3  andn  — 

1 

We  generalise  properties  of  previous  codes  in  this  case. 
Lemma  2:  A  Code  C  is  a  MTRD  [n,  n— 2, 3]  code  if  and 
only  if  any  n  —  2  rows  of  the  general  code  matrix  can  be 
considered  as  information  rows  and  any  n  —  2  columns  of 
the  general  code  matrix  can  be  considered  as  information 
columns. 

Let  C be  the  dual  code  of  C. 

Lemma  3:  A  code  C1-  is  a  MTRD  [n,  2,  n  —  1]  code. 
The  general  code  matrix  of  a  MTRD  [n,  n  —  2, 3]  code 
can  be  represented  in  two  equivelent  forms.  The  full  con¬ 
struction  is  given  and  illustrated  by  example. 

III.  Decoding 

We  discuss  also  decoding  methods  including  the  ma¬ 
jority  decoding  and  soft  decision  decoding  based  on  the 
method  of  trellis  decoding  similar  to  those  described  in 
[5], 
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Abstract  —  The  Hamming-distance  related  lattice  of 
subcodes  of  a  linear  code  C  is  represented  by  a  sub¬ 
code  graph.  The  dimensions  of  these  subcodes  and  the 
dimensions  of  the  subcodes  of  the  dual  are  related  by 
MacWilliams-like  identities.  The  coordinate  permu¬ 
tation  problem  for  minimum  trellis-complexity  is  ap¬ 
proached  by  introducing  suitable  vertex  functions  on 
the  subcode  graph  that  reflects  the  trellis-complexity 
measure.  This  approach  gives  a  simple  new  proof 
for  well-known  results  on  maximum-distance  separa¬ 
ble  (MDS)  codes  and  a  slight  sharpening  of  the  Wolf 
bound  for  a  large  class  of  binary  codes. 

I.  The  Subcode  Graph 

Let  C  C  Fn  be  a  linear  (n,  k)  block  code  over  a  field  F. 
For  each  subset  s  —  {si, . . . ,  s<}  of  the  codeword  components 
{1, . . .  ,n}  =  J  of  cardinality  i,  we  consider  the  f-dimensional 
subspace  F3  C  Fn  with  support  in  s  and  the  subcode  with 
support  in  s,  Ca  —  CC\FS.  If  s'  C  s,  then  there  is  an  inclusion 
map  v,s  :  Cy  — >  C3.  The  lattice  structure  of  the  subcodes  C3 
can  be  illustrated  by  a  subcode  graph,  which  is  defined  as  fol¬ 
lows.  The  vertices  of  the  graph  are  the  subspaces  C3.  There  is 
a  directed  edge  from  vertex  C's  to  vertex  Cs ,  whenever  s'  C  s 
and  the  cardinalities  of  the  two  sets  satisfy  |s'|  + 1  =  |s|.  Note 
that  the  subcode  graph  is  actually  a  trellis  with  n  trellis  sec¬ 
tions,  where  |s|  =  t  represents  the  time  index. 

II.  MacWilliams-like  Identities 

Using  the  techniques  of  the  original  proof  No.  1  of  the 
MacWilliams  Identity  [1],  one  can  derive  some  simple  Mac¬ 
Williams-like  identities,  which  hold  for  arbitrary  fields. 

In  analogy  to  the  full  weight  enumerator  of  a  linear  (n,k) 
code  [2],  we  define  the  full  dimension  enumerator  by 

a(Y,Z)  =  a(Yi,...,Yn,Z1,...,Zn)  =  ^2dim(CnF3)Y3Z^3 

sCI 

where  s  runs  through  all  subsets  of  cardinality  i  =  0, 1, . . . ,  n 
and  Y3  and  Zj\s  denote  the  monomials  Y3l  •  •  •  Y3t  and 
Zti'  —  Zt  n_t,  ti  €  I  \  s,  resp.  The  dimension  enumerator 
of  the  code  C  is  defined  by 

a(Y)  =  ^dim(Cnfs)yw, 

s 

Theorem  1  Let  a(Y)  and  (3 (Y)  (a{ Y,Z)  and  /?( Y,Z)j  be 
the  (full)  dimension  enumerators  of  a  linear  (n,  k)- code  C  and 
its  dual  ,  resp.  Then,  the  following  identities  hold 

Ynp(Y~1)  =  a(Y)  +  J2(n-k-£)(  ”  )  Ye 
1=0  '  ' 
n 

/?( Z,Y)  =  J2  Y3Z,\3+*{ Y,Z). 

1=0  3,  |S|=/ 


Remarks:  1.)  Theorem  1  holds  for  any  field  and  can  be 
extended  to  codes  that  are  projective  modules  over  abelian 
artinian  rings.  2.)  The  lowest  degree  nonzero  term  of  the 
dimension  enumerator  specifies  the  minimum  distance  dm in 
and  the  corresponding  codeword  multiplicity  fxmin- 

III.  The  Permutation  Problem 
For  a  generic  linear  block  code  it  is  computationally  diffi¬ 
cult  to  find  a  permutation  of  the  codeword  components  that 
results  in  a  minimal  trellis  [3],  [4].  We  will  approach  the  per¬ 
mutation  problem  by  introducing  suitable  vertex  functions  on 
the  subcode  trellis: 

•  the  dimension  vertex  function  is  given  by  k(C3)  — 
dim  C3  +  dim  and 

•  (in  case  of  a  finite  field)  the  enumerator  vertex  function 
is  given  by  e(C3)  =  |CS  |  -  |C/\a|. 

Each  path  from  the  starting  node  to  the  ending  node  in 
the  subcode  trellis  corresponds  to  a  chain  of  support  sets  0  = 
s(°)  (-  SW  q  c  s<'n')  —  I  and  this  determines  an  ordering 
of  the  codeword  components.  Thus,  permutation  problems 
for  a  given  linear  block  code  can  be  transformed  into  a  path 
search  problem  on  the  subcode  trellis.  E  g.,  looking  for  an 
optimal  permutation  of  the  coordinates  that  minimizes  the 
maximum  state  space  complexity  is  equivalent  to  finding  a 
path  through  the  subcode  trellis,  which  goes  through  a  vertex 
with  maximum  value  k(C3)  of  the  dimension  vertex  function. 

This  approach  allows  one  to  give  a  simple  alternative  proof, 
using  Theorem  1,  to  show  that  permuting  the  codeword  com¬ 
ponents  of  an  MDS  code  does  not  change  the  dimension  of 
the  state  spaces  of  its  minimal  trellis,  which  is  well-known  [4]. 
Moreover,  for  a  large  class  of  codes,  the  Wolf  bound  (Theo¬ 
rem  5.5  in  [4])  can  be  slightly  sharpened  as  follows. 

Proposition  1  Let  C  be  a  binary  linear  ( n ,  k)  code.  If  either 

(i)  2  <  k  <  n/2  and  the  all-one  word  is  in  C,  or 

(ii)  n/2  <  k  <  n  —  2  and  the  all-one  word  is  in  Cx 

then  the  maximum  state  complexity  K3  is  upper  bounded  by 
K3  <  minffe,  n  —  k}  —  1. 
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I.  Introduction 

A  tail  biting  trellis  for  a  code  is  a  trellis  with  multiple 
starting  and  ending  states  which  has  the  following  structures: 
(1)  the  starting  and  ending  state  spaces  are  identical;  (2)  every 
starting  state  has  a  unique  ending  state  and  they  are  the  same 
state;  and  (3)  a  path  in  the  trellis  is  a  valid  codeword  if  and 
only  if  its  starting  and  ending  states  are  identical  [1,  2].  For 
a  block  code,  tail  biting  trellis  representation  may  result  in  a 
significant  reduction  in  trellis  complexity  [2]. 

II.  General  Structure 

The  general  structure  of  an  T-section  tail  biting  trellis  T 
for  an  (n,k)  Linear  block  code  C  is  depicted  in  Figure  1.  Let 
{0, 1,  denote  the  set  of  state  boundary  locations.  Sup¬ 

pose  T  consists  of  2m  starting  states  and  2m  ending  states. 
We  may  view  T  as  a  union  of  2m  isomorphic  subtrellises  which 
share  a  common  part  from  boundary  location(BL)-ii  to  BL- 
<2,  where  t\  <  ti.  Each  subtrellis  consists  of  those  paths  in  T 
that  connect  a  state  at  BL-0  to  the  same  state  at  BL -L,  and 
it  has  three  parts,  the  header,  the  center  span  and  the  tail. 
The  center  span  is  shared  by  every  subtrellis.  For  1  <  z  <  2m, 
let  Ti  denote  the  subtrellis  whose  starting  and  ending  stages 
are  s ^  and  s^‘\  respectively.  Assume  that  Ti  contains  the 
all-zero  path.  Then  the  paths  in  T\  form  an  (re,  k  —  m )  lin¬ 
ear  subcode  of  C,  denoted  C\ ,  and  the  paths  in  any  other 
subtrellis  form  a  coset  of  Ci  in  C.  Let  v  =  (vi,  V2,  ■  •  ■ ,  vn) 
be  a  codeword  in  C.  Since  all  the  subtrellises  have  the  same 
common  span  from  BL-ti  to  BL-t2,  there  must  be  a  codeword 
w  =  (tui,  W2, ...,  wn)  in  each  subtrellis  whose  components  from 
location-(fi  + 1)  to  location^  are  zeros.  For  convenience,  we 
call  the  part  of  first  t\  components  of  w  the  header,  the  part 
of  last  n  —  t2  components  of  w  the  tail.  Adding  w  to  each  path 
in  T\  results  in  a  subtrellis  which  is  isomorphic  to  Tj  and  is 
identical  to  Ti  from  BL-U  to  BL-t2-  The  header  and  the  tail 
of  this  subtrellis  are  obtained  by  adding  the  header  and  the 
tail  of  w  to  the  header  and  the  tail  of  T\ ,  respectively.  This 
subtrellis  is  the  trellis  for  the  coset  w  +  Ci  of  Ci  and  w  is  the 
coset  representative. 

Although  all  the  subtrellises  share  a  common  span  from  BL- 
<i  to  BL-f2-  Two  individual  subtrellises  may  share  a  longer 
span  starting  from  BL-z  to  BL-j  with  0  <  i  <  ti  and  <2  < 
j  <  L.  For  0  <  i  <  j  <  L,  let  \i,j]  denote  the  interval  {z,  i  + 
1,  ■  •  ■  ,/}.  The  zero-span  of  an  n-tuple  v  =  (»i,  t>2,  -  •  • ,  vn)  is 
defined  as  the  largest  interval  [i,j]  such  that  Uj+i  =  2  = 

•  •  •  =  vj  =  0.  This  definition  implies  that  u;  =  Vj+i  =  1. 
Let  v  be  a  codeword  in  C  but  not  in  Ci  whose  zero-span 
is  [i,j]  with  0  <  j  <  t\  and  <2  <  j  <  L.  It  is  clear  that 
[fi,<2]  C  [i,j].  Let  Ti(v)  denote  the  subtrellis  for  the  coset 
v  +  Ci  obtained  by  adding  v  to  every  path  in  Ti.  Then  T\(v) 
and  Ti  have  a  common  span  from  BL-z  to  BL-j.  Let  v  and  w 
be  codewords  in  two  different  cosets  of  the  partition  C/C\.  Let 


[ii.ji]  and  [z2,  J2]  be  the  zero-spans  of  v  and  w,  respectively. 
Let  [13,^3]  =  [t’i,ii]  n  [12, h\-  Then  the  co-subtrellises,  Ti(v) 
and  Ti(w),  are  isomorphic  and  have  a  common  span  from 
BL-Z3  to  BL-j'3. 


III.  Construction 

For  0  <  <1  <  <2  <  L,  let  C(<i,<2)  denote  the  set  of  code¬ 
words  in  C  which  satisfy  the  following  conditions:  (1)  each 
nonzero  codeword  v  in  C(U,t2)  has  zero  components  from 
location-(<i  +  1)  to  location-<2,  i.e.,  =  vtl+2  =■■■■  — 

vt2  =  0,  and  (2)  the  part  of  first  <1  components  of  v  contains 
at  least  one  nonzero  component  and  the  part  of  last  n  —  ti 
components  of  v  contains  at  least  one  nonzero  component. 
Then  C(t\,  *2)  is  a  linear  subcode  of  C.  The  zero-span  of  each 
codeword  in  C(ti,<2)  contains  [t  1,^2]  as  a  subinterval.  Let 
m  be  the  dimension  of  C{t\,t2).  There  exists  an  (rz,  k  —  m) 
linear  subcode  Ci  in  C  such  that  C  is  the  direct  sum  of  Ci 
and  C(t\,t2).  Let  C/Ci  denote  the  partition  of  C  modulo 
C\.  Then  the  vectors  in  C(ii,<2)  can  be  used  as  the  coset 
representatives  of  the  coset  in  C/C\. 

Let  Ti  be  the  minimal  conventional  bit-level  trellis  for  Ci . 
Form  all  the  co-trellises  Ti(v)  of  Ti  with  v  €  C(ti,i2).  All 
these  co-trellises  have  a  common  span  from  BL-<i  to  BL-t2- 
Putting  all  these  co-trellises  together  and  sharing  maximum 
common  spans  between  them,  we  obtain  a  tail  biting  trellis 
with  2m  starting  states  and  2m  ending  states.  The  overall 
complexity  of  this  tail  biting  trellis  depends  on  the  length  of 
common  span  of  the  co-trellises,  the  choice  of  the  boundary 
locations,  ti  and  t2,  of  the  common  span.  These  parameters 
should  be  chosen  to  minimize  the  trellis  complexity. 
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Figure  1:  General  structure  of  a  tail  biting  trellis. 
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Abstract  —  For  all  linear  ( n,k,d )  MDS  over  finite 
fields  Fpm,  we  identity  a  generator  matrix  with  the 
property  that  the  product  of  trellises  of  rows  of  the 
generator  matrix  will  give  a  minimal  tail-biting  linear 
trellis,  and  viewing  the  code  as  a  group  code,  identify 
a  set  of  generators,  product  of  whose  trellises  will  give 
a  minimal  tail  biting  group  trellis.  We  also  give  the 
necessary  and  sufficient  condition  for  the  existence  of 
flat  minimal  linear  and  group  tail-biting  trellises. 

I.  Introduction 

Trellis  representation  of  block  codes  illuminate  the  structure 
of  the  code  and  also  useful  for  efficient  decoding.  Recently, 
unconventional  "Tail-biting  trellises”  (TBT)  have  been  stud¬ 
ied  for  well  known  codes  like  (24,12,8)  Golay  code,  hexacode 
and  few  other  short  codes  [1], 

Minimal  Tail-Biting  Trellis:  A  tail-biting  trellis  with  min¬ 
imum  maximum  number  of  states  along  with  the  minimum 
product  of  all  state  space  sizes,  among  all  tail-biting  trellis¬ 
es  for  the  code  under  all  possible  coordinate  permutations  is 
called  a  minimal  tail-biting  trellis  for  the  code. 

The  total  span  bound:  [1]  If  C  is  an  ( n,k,d )  linear  code 
over  Fq ,  then  any  n-section  linear  tail-biting  trellis  for  C  sat¬ 
isfies 

(1) 

j  ”0 

5max>9"(d_1)  (2) 

If  q  =  pm ,  then  for  group  trellises  we  have 

SmaX>P^(d~1)  (3) 

Flat  Trellis:  A  tail-biting  trellis  is  said  to  be  flat  if  it  has  a 
constant  state  complexity  profile. 

It  is  well  known  that  any  k  coordinates  of  a  MDS  code  can 
be  taken  as  information  positions.  This  means  that  minimum 
weight  vectors  (of  weight  n  —  k  +  1)  with  circular  span  n  —  k 
can  be  obtained  such  that  the  successive  n  —  k  +  1  nonzero 
components  start  from  any  specified  coordinate  position  from 
{0,1,...,  (n  —  1)}.  It  can  be  shown  that  any  k  such  vectors 
starting  from  different  coordinate  positions  will  constitute  a 
generator  matrix  for  the  code.  Using  these  results  in  the  next 
section  we  specify  the  generator  matrices  that  give  minimal 
tail-biting  trellises  in  terms  of  these  k  coordinate  positions. 

II.  Minimal  Circular  Span  Generator  Matrices 

Theorem  1:  For  a  ( n,k )  linear  MDS  code  over  Fpm,  let 
e  =  gcd(n ,  k ),  n'  —  j,  k'  =  j  and  n'  =  ak'  +  P,  where  a  and 
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(3  are  integers.  The  generator  matrix  which  has  only  minimum 
weight  vectors  with  consecutive  nonzeros  and  with  nonzeros 
starting  from  the  indices  given  by  the  set  /  given  below  gives 
a  minimal  linear  tail-biting  trellis  when  product  of  trellises 
corresponding  to  each  row  vector  is  obtained: 

1=  {[{jn1  +  i(a  +  l)|i  =  0, 1, ...,/?}  U 

{jn  +  /3{a  +‘l)  +  {i  -  P)a\i  -  P  +  1, . . . ,  k'  -  l}] 

J  =  0,1 . (e-1).}  (4) 

Theorem  2:  A  necessary  and  sufficient  condition  for  an  (n,  k) 
linear  MDS  code  over  any  finite  field  to  admit  a  minimal  linear 
flat-trellis  is  that  ”n  divides  k2" . 

Notice  that  the  condition  in  Theorem  2  is  independent  of 
the  size  of  the  field. 

Theorem  3:  For  a  (n,fc)  linear  MDS  code  over  Fp™ ,  let 
e  =  gcd(n,mk),  n  =  j,  k'  =  Also,  let  k'  =  an'  +  k" 
where  0  <  k"  <  n  and  n  —  ak"  +  (3,  where  0  <  P  <  k" 
and  a  and  a  are  integers.  The  group-generator  matrix  which 
has  a  +  1  minimum  weight  vectors  with  consecutive  nonze¬ 
ros  with  nonzeros  starting  from  the  indices  given  by  the  set  I 
given  below  and  a  minimum  weight  vectors  with  consecutive 
nonzeros  with  nonzeros  starting  at  all  other  time  indices  gives 
a  minimal  group  tail-biting  trellis  when  product  of  trellises 
corresponding  to  each  row  vector  is  obtained,  if  the  rows  s- 
tarting  at  the  same  index  are  p-linearly  independent  (which 
can  always  be  achieved): 

1=  {[{jn  +  i(a  +  l)|t  =  0, 1,. . .  ,/?}  U 

{ jn  +  P(a  +  1)  +  (i  —  P)a\i  =  P  +  1, . . . ,  k"  —  l}] 
j  =  0,1,...,  (e-1).}  (5) 

Theorem  4:  A  necessary  and  sufficient  condition  for  a  linear 
(n,  k)  MDS  code  over  FP">  to  admit  a  minimal  group  flat-trellis 
is  that  ”n  divides  mk2” . 

Observe  that  the  condition  in  Theorem  4  depends  only  on 
m  and  not  on  the  characteristic  of  the  field. 
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Abstract  —  Uniformly  efficient  trellis  decoders  are 
known  for  very  few  codes,  and  no  general  method  is 
known  that  can  decide  whether  such  a  decoder  ex¬ 
ists.  It  is  shown  that  this  question  is  substantially 
simplifiable  in  the  case  of  self-dual  codes,  when  cer¬ 
tain  subcodes  meet  the  Griesmer  bound  with  equality. 
Furthermore,  in  many  cases  the  result  makes  it  possi¬ 
ble  to  count  the  number  of  uniformly  efficient  permu¬ 
tations.  In  some  cases  the  existence  and  number  of 
uniformly  efficient  trellises  may  be  deduced  directly 
from  the  parameters  of  the  code.  Among  the  codes 
that  meet  the  criterion  are  the  [24,12,8]  Golay  code, 
for  which  the  number  of  uniformly  efficient  permu¬ 
tations  is  derived,  four  of  the  [32, 16, 8]  doubly  even 
codes,  and  the  [48, 24, 12]  quadratic  residue  code,  for 
which  a  lower  bound  on  the  number  of  uniformly  ef¬ 
ficient  permutations  is  derived. 

I.  Introduction 

We  consider  the  permutation  problem  for  trellis  decoders 
for  block  codes.  For  all  necessary  definitions,  background, 
and  references,  we  refer  to  Vardy’s  chapter  [4],  in  particular 
Section  5. 

For  any  fixed  ordering  of  a  code,  a  minimal  trellis  may  be 
found  efficiently.  However,  an  equivalent  code  will  have  its 
own  minimal  trellis,  which  may  be  of  substantially  lower  com¬ 
plexity.  As  there  is  no  useful  distinction  between  two  equiva¬ 
lent  codes  for  many  purposes,  the  problem  is  to  find  a  permu¬ 
tation  that  minimizes  the  complexity  in  some  sense. 

Various  definitions  of  optimality  may  be  used;  here  we  are 
concerned  with  one  of  the  strongest:  that  of  “uniform  effi¬ 
ciency.”  A  permutation  is  uniformly  efficient  if  the  minimal 
trellis  for  the  corresponding  code  minimizes  the  state  com¬ 
plexity  at  each  time  unit  simultaneously,  i.e.,  if  s,(7r*(C))  < 
s,(tt(G))  for  all  permutations  rc  and  all  i.  Such  a  permutation 
may  or  may  not  exist. 

There  are  very  few  codes  for  which  such  permutations  are 
known.  These  include  the  binary  Reed-Muller  codes,  MDS 
codes,  the  [24, 12, 8]  Golay  code,  the  [48, 24, 12]  quadratic 
residue  code,  and  the  [16,7,6]  lexicode  [4].  Classification  of 
the  existence  or  nonexistence  question  for  short  self-dual  codes 
has  been  carried  out  by  Encheva  and  Cohen  [2,  3]. 

Here  we  consider  self-dual  codes.  Our  main  result,  when 
it  applies,  provides  a  way  of  demonstrating  the  existence  of  a 
uniformly  efficient  permutation  and  of  counting  all  such  per¬ 
mutations.  This  resolves  a  question  posed  by  McEliece  for  the 
case  of  the  [24,12,8]  Golay  code. 

II.  Main  result 

Theorem.  Let  C  be  a  self-dual  code.  Suppose  kn/2{C)  is 
such  that  n/2  =  gq{kn/2,d),  where  gq(k,d)  =  f^/V  1  JS 

the  Griesmer  bound  function.  Then: 

'This  work  was  supported  in  part  by  the  U.  S.  Army  Research 
Office  under  Grant  DAAH04-96-1-0377. 


(a)  the  code  satisfies  the  double  chain  condition; 

(b)  the  code  meets  the  DLP  bound; 

(c)  the  code  has  the  smallest  state  complexity  in  each  com¬ 

ponent  among  all  self-dual  codes  of  the  same  length  and 
dimension,  and  at  least  the  same  distance; 

(d)  the  optimum  state  complexity  profile  is  s;  =  i  —  (i,  d) 

fori  <  n/2,  and  sn =  s where  g~l  (i ,  d)  =  max{y|i  > 

g{j ,  d)}; 

(e)  a  permutation  is  uniformly  efficient  if  and  only  if  it  is  of 

the  form 

'  Gi  0 

G  =  0  t{G2) 

E  F 

where  C\  =  (Gi)  is  a  length  n/2,  distance  d  code  that 
meets  the  Griesmer  bound  with  equality,  and  is  in  chain 
condition  order;  and  where  G2  generates  a  code  with  the 
same  parameters  as  C i ,  and  is  in  chain  condition  order; 
and  where  t(G2)  is  the  column  reverse  of  G 2. 

III.  Application 

The  main  result  applies  to  the  following  self-dual  codes:  the 
binary  [8,4,4],  [12,6,4],  [14,7,4],  and  [24,12,8]  codes,  one  of 
the  [16,  8, 4]  codes,  four  of  the  eight  [32,  16,  8]  codes,  and  the 
[48,  24, 12]  quadratic  residue  code;  the  ternary  [12,  6,  6]  Go- 
lay  code,  and  both  ternary  [24,  12,9]  codes.  The  results  for 
three  of  the  [32, 16, 8]  binary  codes  and  both  ternary  [24, 12, 9] 
codes  are  new.  In  addition,  using  part  (e),  we  may  in  some 
cases  easily  find  the  number  of  uniformly  efficient  permuta¬ 
tions:  this  happens  for  the  binary  [12,6,4],  [14,  7,4],  [16,  8,4], 
and  [24,12,8]  codes,  and  the  ternary  [12,6,6]  and  [24,12,9] 
codes.  The  number  of  uniformly  efficient  permutations  for 
the  [24, 12,  8]  Golay  code  is,  from  part  (e),  equal  to  the  num¬ 
ber  of  Xi2’s  times  the  square  of  the  number  of  chain  condition 
orderings  of  a  [12,  2,  8]  code,  i.e.,  35420(3  -  8!  -  4! )2  permutations 
out  of  all  24!,  a  fraction  of  approximately  4.81  X  10“7  of  all 
permutations. 

The  main  result  may  be  generalized  at  the  cost  of  a  non¬ 
trivial  increase  in  the  difficulty  of  application  [1];  the  general¬ 
ization  applies  to  many  more  self-dual  codes. 
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Abstract  —  In  this  paper  we  prove  that  for  general 
memoryless  binary  input  channels,  most  ensembles  of 
parallel  and  serial  turbo  codes,  with  fixed  component 
.  codes,  are  “good”  in  the  sense  that  with  maximum 
likelihood  decoding,  their  word  (or  bit)  error  proba¬ 
bility  decreases  to  zero  as  the  block  length  increases, 
provided  the  noise  is  below  a  finite  threshold.  Our 
proof  uses  the  classical  union  bound,  which  shows  that 
under  very  general  conditions,  if  the  noise  is  below  a 
certain  threshold,  the  word  (or  bit)  error  probability 
is  controlled  by  the  low-weight  codewords  as  the  block 
length  approaches  infinity.  Our  main  coding  theorems 
then  follow  from  a  study  of  the  low  weight  terms  in 
the  ensemble  weight  enumerator.  Using  this  method¬ 
ology,  we  can  prove  that  the  threshold  is  finite  for 
most  ensembles  of  parallel  and  serial  turbo  codes. 

I.  Introduction 

This  paper  addresses  the  basic  question  as  to  whether  turbo 
codes,  both  parallel  and  serial,  are  “good’’  in  the  sense  of 
MacKay  [5].  The  earliest  work  on  this  problem  is  in  [1,  2]. 
where  “interleaving  gain”  was  first  proposed  and  investigated, 
but  this  was  not  fully  rigorous. 

In  this  paper,  we  restrict  ourselves  to  memoryless  binary 
input  channels  with  maximum  likelihood  decoding.  Our  spe¬ 
cific  goal  is  to  prove  a  general  coding  theorem  for  ensembles 
of  parallel  or  serial  concatenated  convolutional  codes,  where 
the  ensemble  is  taken  with  respect  ot  all  possible  interleavers. 
The  tools  we  use  are  the  union  bound  and  the  ensemble  weight 
enumerator.  Previously,  in  [3],  we  analyzed  RA  codes  on 
AVVGN  channels,  by  deriving  the  input-output  weight  enumer¬ 
ator  (IOWE),  from  which  we  could  compute  a  signal-to-noise 
ratio  threshold  above  which  the  ensemble  is  “good."  This 
technique  fails  for  complex  component  codes,  because  calcu¬ 
lation  of  the  IOWE  is  intractable.  Fortunately,  to  prove  cod¬ 
ing  theoremd,  the  exact  IOWE  isn’t  indispensable.  Instead,  a 
good  upper  bound  of  that  proves  to  be  sufficient. 

II.  Union  Bounds 

Consider  a  linear  (n,N)  block  code  C  with  rate  11,  =  N/n. 
The  union  bound  on  the  word  error  probability  Pw  of  the 
code  C  over  a  memoryless  binary  input  channel,  using  ML 
decoding  has  the  form: 

n 

(1) 

h=  1 

where  Ah  denotes  the  number  of  codewords  in  C  with  output 
weight  h.  The  parameter  a  is  determined  by  channel. 

1This  work  was  supported  by  NSF  grant  no.  CCR-9804793,  and 
grants  from  Sony  and  Qualcomm. 


III.  Main  Result 

Theorem  1  For  an  ensemble  of  a  parallel  concatenated  con¬ 
volutional  code  with  recursive  components,  if  the  number  of 
recursive  parallel  branches  is  k  >2,  then  there  exists  a  posi¬ 
tive  number  7„,  such  that  for  any  fixed  a  >  -y0l 

Pw  =  0(n~k+2+c)  (2) 

Pb  =  0(rrk+1+e)  (3) 

for  arbitrary  e  >  0. 

Theorem  2  For  an  ensemble  of  a  serial  concatenated  con¬ 
volutional  code  with  recursive  inner  code,  if  the  free  distance 
of  the  outer  code  df  is  at  least  3,  then  there  exists  a  positive 
number  7„,  such  that  for  any  fixed  a  >  70, 


Pw 

=  0(n"L^2-J+t). 

(4) 

Pb 

1  do  +1  1-1- 

(5) 

for  arbitmry  e  >  0. 

IV.  Remarks 

The  thresholds  derived  by  classical  union  bound  are  by  no 
means  the  best  possible.  In  the  following  table,  we  compare 
those  thresholds  for  RA  codes  over  BSC  derived  by  union 
bound  with  those  by  typical  set  decoder  bound  [6], 


<7 

R 

UB:7„ 

TD:7, 

Capacity 

3 

1/3 

0.091 

0.132 

0.174 

4 

1/4 

0.132 

0.191 

0.215 

5 

1/5 

0.1 03 

0.228 

0.243 
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Abstract —  We  construct  irregular  turbocodes  with  systematic 
bits  that  participate  in  varying  numbers  of  trellis  sections.  By 
making  the  original  rate  1/2  turbocode  of  Berrou  et  al  slightly 
irregular,  we  obtain  a  coding  gain  of  0.15  dB  at  BER  =  10  4. 

I.  Irregular  turbocodes 

Recently,  significant  coding  gains  have  been  obtained  by 
making  the  codeword  bits  of  low  density  parity  check  codes 
participate  in  varying  numbers  of  parity  checks  (c.f.  [1,2]). 

What  we  call  an  irregular  turbocode  [3]  has  the  form  shown 
in  Fig.  1 ,  which  is  a  type  of  “trellis-constrained  code”  [3] .  One 
way  to  describe  the  code  is  by  a  degree  profile,  fd  €  [0, 1],  d  e 
{1,2,...  ,£>},  where  fd  is  the  fraction  of  codeword  bits  that 
have  degree  d  and  D  is  the  maximum  degree.  Each  codeword 
bit  with  degree  d  is  repeated  d  times  before  being  permuted 
and  connected  to  the  trellis  for  a  convolutional  code.  If  the 
bits  in  the  convolutional  code  are  partitioned  into  “systematic 
bits”  and  “parity  bits”,  then  by  connecting  each  parity  bit  to  a 
degree  1  codeword  bit,  we  can  encode  in  linear  time  by  copy¬ 
ing,  permuting  and  encoding  the  systematic  bits. 

The  overall  rate  R  of  an  irregular  turbocode  is  related  to  the 
rate  R'  of  the  convolutional  code  and  the  average  degree  d  by 
<1(1  -  R')  =  1  -  R.  So,  if  the  average  degree  is  increased,  the 
rate  of  the  convolutional  code  must  also  be  increased  ( e.g . ,  by 
puncturing  or  redesign)  to  keep  the  overall  rate  constant. 

II.  Decoding  irregular  turbocodes 

Fig.  1  can  be  interpreted  as  the  graphical  model  (factor 
graph,  Bayesian  network,  etc.)  [4,  5]  for  the  irregular  tur¬ 
bocode.  Decoding  consists  of  applying  the  sum-product  al¬ 
gorithm  (a  generalized  form  of  turbodecoding)  in  this  graph. 

The  decoder  first  computes  the  N  channel  output  log- 
likelihood  ratios  L° , . . .  ,L°N,  and  then  repeats  each  log- 
likelihood  ratio  appropriately.  For  bit  i  with  degree  du  set 
Li,i  4-  L° , . . .  ,  Litd  4-  L°.  Next,  the  log-likelihood  ratios 
are  permuted  and  fed  into  the  BCJR  algorithm  for  the  convo¬ 
lutional  code,  which,  for  bit  i,  produces  d  a  posteriori  log- 
probability  ratios,  L\  l5 . . .  ,  L\  d.  The  current  estimate  of  the 

log-probability  ratio  for  bit  i  is  L*  4-  L\  +  ]£*=i  {L'i  k  - 
Li)k).  The  inputs  to  the  BCJR  algorithm  for  the  next  itera¬ 
tion,  are  computed  by  subtracting  off  the  corresponding  out¬ 
puts  from  the  BCJR  algorithm  produced  by  the  previous  itera¬ 
tion:  Lj,*  4 —Li  —  L'ik. 

III.  Discussion 

Fig.  2  shows  the  simulated  BER-E),  /N0  curves  for  the  orig¬ 
inal  regular  turbocode  and  an  irregular  turbocode  that  we  came 
up  with  by  making  5%  of  the  codeword  bits  in  the  original  tur¬ 
bocode  have  degree  10  The  irregular  turbocode  clearly  per¬ 
forms  better  than  the  regular  turbocode  for  BER  >  10"4. 

For  high  Et/N0,  most  of  the  errors  for  the  irregular  tur¬ 
bocode  were  due  to  low-weight  codewords.  Our  permuter 
was  drawn  from  a  uniform  distribution  over  permuters,  but 


Figure  1:  A  general  irregular  turbocode.  For  d  =  l, . . . ,  D,  fraction 
fd  of  the  codeword  bits  are  repeated  d  times,  permuted  and  connected 
to  a  convolutional  code. 


Figure  2:  Performances  of  the  original  block  length  N  =  131, 072 
turbocode  (dashed  line)  and  one  of  its  irregular  cousins  (solid  line). 

we  expect  the  BER  “flattening”  effect  can  be  significantly  re¬ 
duced  by  carefully  designing  the  permuter  and  the  convolu¬ 
tional  code,  possibly  by  extending  the  method  of  “density  evo¬ 
lution”  to  convolutional  codes.  We  are  also  studying  ways  of 
constraining  the  degree  1  “parity”  bits  ( i.e .,  increasing  then- 
degree)  to  eliminate  low- weight  codewords. 

For  BER  >  10-4  this  irregular  turbocode  performs  in  the 
same  regime  as  the  best  known  irregular  Gallager  code  [2]. 
We  expect  the  improvement  in  performance  to  be  even  more 
significant  for  lower-rate  codes,  since  the  constituent  convo¬ 
lutional  code  can  have  lower-rate,  thus  eliminating  many  low- 
weight  codewords  while  retaining  the  benefit  of  irregularity. 
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Abstract  — 

The  purpose  of  this  paper  is  to  contradict  a  com¬ 
mon  myth  about  turbo  codes.  We  are  specifically 
addressing  R  =  1/3  parallel-concatenated  codes  using 
systematic,  recursive  constituent  codes. 

Myth:  Turbo  codes  constisting  of  constituent  codes 
with  large  memory  order  (i.e.,  a  large  number  of  trel¬ 
lis  states)  are  not  as  effective  as  the  original  Berrou- 
Glavieux-Thitimajshima  turbo  code  [1]  in  the  water¬ 
fall  region  of  small  signal-to- noise  ratios  (SNR’s). 

This  myth  is  contradicted  by  a  turbo  code  whose 
recursive  constituent  codes  have  256  states.  Decod¬ 
ing  this  turbo  code  with  the  BCJR  APP  decoder  gives 
bit-error-rate  (BER)  and  frame-error-rate  (FER)  per¬ 
formance  better  than  the  original  (Berrou)  turbo¬ 
code  at  all  SNR’s. 


I.  Summary 

The  iterative  BCJR  APP  decoding  algorithm  [1]  [2]  per¬ 
mits  relatively  quick  decoding  of  turbo  codes.  Although  the 
decoding  algorithm  is  suboptimal,  it  does  perform  very  close 
to  the  optimal  maximum  likelihood  (ML)  decoder  except  at 
very  small  SNR’s  in  the  waterfall  region.  The  iterative  algo¬ 
rithm  has  difficulty  starting  the  convergence  toward  a  solution. 
The  first  decoding  iteration(s)  of  the  constituent  codes  must 
produce  a  posteriori  estimates  that  are  good  enough  a  priori 
estimates  to  push  the  subsequent  iterations  towards  the  ML 
solution  instead  of  stalling  the  convergence  in  some  region  of 
the  solution  space.  In  general,  at  very  small  SNR’s,  a  system¬ 
atic,  recursive  constituent  code  with  short  cycle  length  will 
produce  better  extrinsic  APP  estimates  for  the  information 
bits  than  a  code  with  a  long  cycle  length.  For  example,  the 
8-state  code  below  has  a  cycle  length  of  3  (where  the  paren¬ 
theses  just  indicate  the  periodic  cycles  within  the  recursive 
portion  of  the  impulse  response): 


[1  +  D  +  D7  +D3]  a  [1111] 
[1  +  D3}  ~  [1001] 


1(110)(110)...  . 


The  single  one  bit,  out  in  front,  can  be  considered  as  the  feed¬ 
forward  portion  of  the  impulse  response.  A  turbo  code  using 
this  constituent  code  does  give  good  extrinsic  estimates  at  the 
start  of  the  iterative  decoding  algorithm  at  very  small  SNR’s 
(hence  it  starts  to  diverge  away  from  the  starting  point),  how¬ 
ever  it  has  difficulty  finishing  the  convergence  to  the  ML  so¬ 
lution.  We  can  “strengthen”  the  constituent  code  while  re¬ 
taining  the  short  cycle  length  by  increasing  the  complexity  of 
the  feedforward  portion  of  the  impulse  response.  Consider  the 
code: 


[110000011] 

[111] 


1011011(110)(110)... 


'This  work  was  supported  by  NASA  Grants  NAG5-557  and 
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We  call  this  a  Big  Numerator-Little  Denominator  (BN-LD) 
code.  It  is  described  by  a  trellis  with  256  states.  The  turbo 
code  using  this  256-state  constituent  code  has  an  improved 
ability  to  finish  the  convergence  to  the  ML  solution  compared 
to  the  previous  8-state  code  with  a  single  one  in  the  feed¬ 
forward  portion.  However,  analogous  to  feedforward  convolu¬ 
tional  codes,  the  increased  complexity  of  the  feedforward  por¬ 
tion  of  the  impulse  response  does  somewhat  reduce  the  con¬ 
vergence  start-up  ability  that  is  due  to  the  short  cycle  length. 
See  Fig.l  for  BER  and  FER  performance  simulations  of  this 
256-state  BN-LD  turbo  code  compared  to  the  (Berrou)  code. 

The  feedforward  portion  of  the  BN-LD  code  was  not  picked 
at  random,  but  rather  specifically  designed  to  produce  an  ad¬ 
ditional  “thinning”  of  the  closest  codewords  due  to  weight-2 
inputs.  This  is  a  new  degree  of  freedom  that  can  be  exploited 
to  give  a  new  distance  profile  for  the  lowest  weight  codewords. 

A  useful  application  of  BN-LD  codes  is  as  a  replacement 
for  the  accumulator  code,  [1,  j^j]  =k  (1)(1)(1)...,  which  is 
used  in  serial  concatenation  and  other  schemes  [3],  The  BN- 
LD  accumulator  code  has  the  form  [1,  jy+^j]  where  the  order 
of  the  numerator  polynomial  n(D )  is  two  or  greater.  For  ex¬ 
ample,  the  code  [1,  ^jyp^]  =>  1110(1) (1)(1)...  has  a  better 
ability  to  finish  converging  to  the  ML  solution  compared  to 
the  standard  accumulator  code. 


|  1«-03 
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Figure  1:  Simulation  results  for  the  rate- 1,  256-state  BN-LD 
turbo  code  (labeled  “Big  Numerator”)  and  the  Berrou  turbo  code. 
(The  interleaver  frame  size  is  16,384;  and  18  iterations  are  used.) 
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Abstract  -  This  paper  describes  a  new  class  of  codes, 
chaotic  turbo  codes.  They  were  born  from  a  symbiosis 
between  a  chaotical  digital  encoder  and  a  turbo  code.  This 
paper  investigates  the  most  important  properties  of  both 
chaotic  digital  encoders  and  turbo  encoders  in  order  to 
understand  how  the  two  complement  each  other.  A  Chaotic 
Turbo  Encoder  is  then  described  and  initial  results  will  be 
presented. 

i.  Introduction 

A  chaotic  digital  encoder  was  defined  for  the  first  time  in  [1] 
as  a  non-linear  digital  filter  with  finite  precision  (8  bits) 
which  behaves  in  a  quasi-chaotic  fashion,  both  with  zero 
and  nonzero  input  sequences.  A  simple  chaotic  encoder  is 
shown  in  Figure  1  [1]. 


Figure  1 :  Chaotic  Digital  Encoder 


The  main  features  of  chaotic  digital  encoders  that  are  used 
in  this  paper  are: 

•  The  system  is  digital  which  makes  possible  its 
integration  with  a  turbo  code. 

•  The  output  of  a  chaotic  digital  encoder  with  arbitrary 
inputs  has  a  broadband  noiselike  spectrum. 

•  The  auto  correlation  function  of  the  output  is  similar  to 
an  uncorrelated  noise  sequence. 

•  The  outputs  of  a  chaotic  digital  encoder  with  almost  all 
arbitrary  inputs  are  uncorrelated  to  the  input  for  almost 
all  choices  of  initial  conditions. 

•  The  outputs  of  a  chaotic  digital  encoder  with  the  same 
input  sequences  are  uncorrelated  to  one  another  for 
almost  all  choices  of  different  initial  conditions. 

•  For  almost  all  choices  of  input  for  two  identical  chaotic 
digital  encoders  having  different  but  arbitrarily  close 
initial  states,  the  states  of  the  two  encoders  will  diverge. 

Another  important  result  in  this  area  is  that  chaotic  circuits 
taken  from  an  appropriate  class  can  be  made  to  synchronise. 
It  has  been  shown  that  a  chaotic  system,  in  the  presence  of 
a  continuous  perturbation,  is  able  to  asymptotically  track  a 
replica  of  itself  if  it  can  be  decomposed  into  subsystems  with 
stable  Lyapunov  exponents.  Binary  digits,  X^,  are  presented 
one  at  a  time  to  the  encoder  and  mapped  onto  either  0  or 
2(H)  The  additions  are  on  L  bits  and  the  arithmetic  is 


modulo  2l.  The  non-linear  map  is  the  LCIRC  bloc  which 
performs  a  rotate  left  operation.  There  are  only  two  delay 
elements  (D)  in  the  encoder,  of  L  bits  each.  Each  encoder 
output,  Yk,  is  L  bits  wide  and  can  modulate  one  or  more 
pulses. 

ii.  Chaotic  Turbo  Encoder 

The  chaotic  digital  encoder  shown  in  Figure  1  could  replace 
the  recursive  systematic  encoder  used  in  a  turbo  code  [2]. 
The  key  element  in  a  turbo  encoder  is  the  interleaver.  The 
role  of  the  interleaver  is  to  feed  into  the  second  encoder  the 
same  data  but  in  a  different  random  order,  such  that  at  the 
receiving  end,  each  decoder  to  be  able  to  make 
“independent”  decisions  for  the  same  data  bit.  A  similar 
effect  to  interleaving  can  be  achieved  with  the  chaotic 
digital  encoder  if  the  initial  states  are  different.  Both 
encoders  use  feedback  registers,  one  using  binary  data,  the 
other  L-tuples.  The  advantage  of  using  a  chaotic  encoder 
in  a  turbo  encoder  consists  in  the  possible  elimination  of  the 
interleaver,  therefore  reducing  delay  in  the  system.  The 
only  difference  appears  in  the  non-linearity  inserted  in  the 
chaotic  digital  encoder. 

ii.  Conclusions 

The  paper  described  a  chaotic  turbo  encoder.  Simulation  of 
the  new  Chaotic  Turbo  Codes  are  expected  to  show  an 
improvement  on  the  results  reported  in  [3],  which  are  based 
on  a  decision  directed  state  feedback  decoder.  Similar  work 
in  the  area  of  secure  communication  using  chaotic  signals 
without  coding  was  reported  in  [4]  for  both  AWGN  and 
mobile  channels.  The  use  of  turbo  codes  might  prove  a  key 
element  in  reducing  the  high  signal-to-noise  ratios  required 
by  chaotical  systems. 
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Abstract  —  If  a  random  codebook  for  lossy  source 
coding  is  generated  by  a  non-optimum  reproduction 
distribution  Q ,  then  the  entropy  of  the  index  of  the  D- 
matching  codeword  is  reduced  by  conditioning  on  the 
codebook:  the  number  of  bits  saved  is  equal  to  the  di¬ 
vergence  between  the  “favorite  type”  in  the  codebook 
and  the  generating  distribution  Q. 

I.  Statement  of  Main  Result 

Consider  coding  a  source  word  X  =  X\  ...Xi,  generated  i.i.d. 
by  a  distribution  P  over  a  finite  alphabet  X,  into  a  code  word 
Y  =  Y\...Yi  (“reconstruction”)  from  a  finite  alphabet  y,  under 
the  distortion  constraint  d(X,Y)  =  l/l^P.d(Xi,Yi)  <  D. 
As  shown  in  [1],  if  a  codebook  Yi,Y2)...  of  words  in  yl  is 
generated  i.i.d.  according  to  a  distribution  Q,  and  if  we  de¬ 
note  by  Ni  the  index  of  the  first  codeword  that  satisfies  the 
distortion  constraint,  then 

j  log  (TV;)  — >  R(P,Q,  D )  in  probability,  (1) 

where  the  constant  R(P,Q,D)  is  given  by,  [2], 

R(P,Q,D)=rmn{lm(P\\Q',D)  +  D(Q'\\Q)y  (2) 

Here  D(-)  denotes  divergence  (or  relative  entropy),  and 
the  function  Im  denotes  the  “lower  mutual  information” 
Im(P\\Q,  D)  =  minvy  I{P,  W),  where  I(P,W)  denotes  the 
mutual  information  associated  with  input  distribution  P  and 
transition  distribution  W  from  X  to  y,  and  the  minimization 
is  taken  over  all  W’s  such  that  the  input  P  induces  output 
distribution  Q  and  average  distortion  less  than  or  equal  to  D 
(if  no  such  W  exists  then  Im(P\\Q,  D)  is  equal  to  infinity). 
The  “mismatched”  coding  rate  function  R(P,Q,D)  is  greater 
than  or  equal  to  the  rate  distortion  function  of  the  source, 
with  equality  if  and  only  if  Q  is  an  optimum  reproduction 
distribution  which  realizes  the  rate  distortion  function. 

It  was  further  shown  in  [2]  that  for  large  word  length,  the 
random  type  T/v,  of  the  Z?-matching  codeword  YV,  concen¬ 
trates  around  a  limiting  distribution: 

TV,  -4  Q*p  q  D  as  l  — >•  oo  in  probability,  (3) 

where  is  the  distribution  Q'  which  achieves  the  min¬ 

imum  in  (2).  This  distribution,  called  “the  favorite  type” 
(although  Qp,QtD  is  in  general  not  an  Z-type),  strikes  the  opti¬ 
mum  balance  between  covering  efficiency  and  frequency  in  the 
codebook.  It  follows  from  (3)  that  most  of  the  first  2lR('P'Ci'D^ 
codewords  in  the  codebook  are  asymptotically  useless;  only 
those  having  a  type  close  to  Qp  Q  fl  -  whose  fraction  in  the 
codebook  is  only  «  2~lD^  pP-D  -  have  a  good  chance  to 

1  This  work  was  supported  in  part  by  the  BSF  grant  no.  9800309. 


be  the  first  to  D-match  the  source  word.  In  a  sense,  we  are 
paying  extra  D(Qp  Q  D  ||Q)  bits  in  coding  rate.  Our  main  re¬ 
sult  shows  that  this  redundancy  can  be  removed  by  entropy 
coding  conditioned  on  the  codebook. 

Theorem  1  If  Q  is  positive  everywhere,  then 

lim  jH(Ni\Yi,Y2,  ■■■)  =  Im(P\\Qp,Q,D,  D).  (4) 

t  — ►OO  l 

Note  that  without  conditioning  on  the  codebook,  the  in¬ 
dex  Ni  is  approximately  uniformly  distributed  over  the  range 
{\...2lRI'P,Ci'D^),  so  its  entropy  is  equal  to  R(P,Q,  D). 

II.  Example 

Assume  X  =  T  =  {0, . . . ,  \X\  —  1}.  Consider  a  uniform 
codebook  generating  distribution  Q(y )  =  1/|T|  Vy,  and  asym¬ 
metric  distortion  measure  of  the  form  d(x,y)  =  d{y  —  x),  where 
the  subtraction  is  modulo-|A’|.  Then 

R(P,Q,  D)  =  log  \X\  -  Hmax 

and 

Qp,q,d  —  P  *  V  ; 

hence  the  conditional  index  entropy  (4)  is  given  by 

Im(P\\Qp,Q,D,D)  =  H(P  *  V”)  -  Hmax, 

where  Hmax  and  V *  denote  the  maximum-entropy  under  a  D- 
constraint  and  the  maximum-entropy  achieving  distribution, 
respectively: 

Hmax=H(V')=  max  H(V)  , 

V:  T  V(y)d(y)<D 

l—Jy 

and  the  *  sign  denotes  a  circular  convolution  (i.e. ,  P  *  V"  is 
the  distribution  of  the  independent  sum  of  a  random  variable 
~  P  and  a  random  variable  ~  V*). 

A  generalization  of  this  work  to  the  continuous  case  links 
the  conditional  entropy  (4)  with  the  entropy  of  dithered  lattice 
quantizers. 
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Abstract  — -  We  consider  adaptive  sequential  lossy 
coding  of  bounded  individual  sequences.  The  encoder 
and  the  decoder  are  connected  via  a  noiseless  channel 
of  capacity  R  and  both  are  assumed  to  have  zero  delay. 
No  probabilistic  assumptions  are  made  on  how  the  se¬ 
quence  to  be  encoded  is  generated.  For  any  bounded 
sequence  of  length  n,  the  distortion  redundancy  is  de¬ 
fined  as  the  normalized  cumulative  squared  distortion 
of  the  sequential  scheme  minus  the  normalized  cumu¬ 
lative  squared  distortion  of  the  best  scalar  quantizer 
of  rate  R  which  is  matched  to  this  particular  sequence. 
We  demonstrate  the  existence  of  a  zero-delay  sequen¬ 
tial  scheme  which  uses  common  randomization  in  the 
encoder  and  the  decoder  such  that  the  normalized 
maximum  distortion  redundancy  converges  to  zero  at 
a  rate  n-1^5  log  n. 

I.  Summary 

A  (randomized)  zero-delay  sequential  source  code  of  rate 
R  —  logAf  is  described  by  an  encoder-decoder  pair  which 
are  connected  via  a  noiseless  channel  of  capacity  R.  It  is 
assumed  that  both  the  encoder  and  the  decoder  have  access 
to  a  common  sequence  of  random  variables  where 

each  Ui  is  uniformly  distributed  on  the  interval  [0, 1].  The 
input  to  the  encoder  is  a  sequence  of  real  numbers  x\,  X2, .  ■  • 
assumed  to  be  bounded  such  that  Xi  €  [0, 1]  for  all  i  >  1. 
At  each  time  instant  i  =  1,2,...,  the  encoder  observes  Xi 
and  the  random  number  Ui.  Based  on  Xi ,  Ui,  and  the  past 
input  values  x'~l  =  (xi,...  , x,_i ),  the  encoder  produces  a 
channel  symbol  y,  G  {1,2,...  ,M)  which  is  then  transmitted 
to  the  decoder.  After  receiving  y,,  the  decoder  outputs  the 
reconstruction  value  Xi  based  on  Ui  and  the  channel  symbols 
yl  —  (j/i, . . .  ,  yi)  received  so  far. 

More  formally,  the  code  is  given  by  a  sequence  of  encoder- 
decoder  functions  {fi,gi}iL i,  where 

fi  :  [0, 1]*  x  [0, 1]  — >  {1, 2, . . .  ,  M} 

and 

pi  :  {1,  2, ...  ,  M}{  x  [0, 1]  — ►  [0, 1]. 

so  that  yi  =  fi(xl,  Ui)  and  Xi  =  gi{y\Ui),  i  =  1, 2, ... .  Note 
that  there  is  no  delay  in  the  encoding  and  decoding  process. 
Zero-delay  schemes  have  an  obvious  advantage  over  other  cod¬ 
ing  methods  (such  as  block  codes)  in  applications  where  de¬ 
coding  delay  is  a  crucial  factor. 

The  normalized  cumulative  squared  distortion  of  the  se¬ 
quential  scheme  at  time  instant  n  is  given  by 

Dn{xn)  =  —  y^(xi  -  Xi)2 
n  L — ' 

1  =  1 
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where  the  dependence  of  Dn  on  the  randomizing  sequence  is 
suppressed  in  the  notation.  The  expected  cumulative  distor¬ 
tion  is  Dn(xn)  ~  E[Dn(xn)],  where  the  expectation  is  taken 
with  respect  to  the  randomizing  sequence  Un  =  (Ui, . . .  ,  Un). 

Let  Q  denote  the  collection  of  all  M- level  scalar  quantizers 
over  [0, 1],  For  any  sequence  xn,  let  D„  (xn)  denote  the  mini¬ 
mum  normalized  cumulative  distortion  in  quantizing  xn  with 
an  M -level  scalar  quantizer,  that  is,  let 

D*n{xn)  =  min  -  V'fai  -  Q(xi))2. 

QeQ  n 

t=i 

Note  that  to  find  a  Q  €  Q  achieving  D*n{xn)  one  has  to  know 
the  entire  sequence  xn  in  advance.  The  next  theorem  asserts 
that  there  exists  a  zero-delay  sequential  source  code  of  rate  R 
which,  for  any  bounded  input  sequence,  performs  asymptoti¬ 
cally  as  well  as  the  best  scalar  quantizer  of  rate  R  matched  to 
the  entire  sequence. 

Theorem  1  For  any  R  =  log  M  there  exists  a  randomized 
zero-delay  sequential  source  code  {fi, gi}?)Li  of  rate  R  whose 
expected  normalized  cumulative  distortion  Dn{xn)  satisfies, 
for  all  xn  €  [0,  l]n, 

Dn{xn)  -  D*n{xn)  <GrT1/5  log  n, 
where  C  is  a  constant  independent  of  n  and  x ™ .  In  particular, 

limsup  max  (  Dn{xn)  —  Drn(xn)  J  <0. 
n->oo  z"e[0,l]n\  J 

The  construction  of  the  coding  scheme  in  the  theorem  uses 
an  appropriately  modified  version  of  the  exponential  weighting 
method  of  Vovk  [1]  in  which  the  class  of  “experts”  is  a  finite 
set  of  judiciously  chosen  reference  quantizers.  Ideally,  the  cu¬ 
mulative  losses  of  these  experts  should  be  used  to  form  the 
weights  in  the  exponential  weighting  scheme.  A  substantial 
difficulty  is  that  these  losses  are  not  available  at  the  decoder 
since  (unlike  in  sequential  lossless  coding)  the  decoder  does 
not  have  access  to  the  past  source  outputs.  We  overcome  this 
problem  by  periodically  transmitting  approximate  versions  of 
the  cumulative  losses  of  the  reference  quantizers.  We  show 
that  using  only  a  small  fraction  of  the  overall  available  rate 
to  transmit  the  approximate  cumulative  losses,  the  proposed 
scheme  does  asymptotically  as  well  as  a  hypothetical  scheme 
in  which  the  decoder  has  full  access  to  the  cumulative  losses 
of  the  reference  quantizers  (such  a  scheme  requires  a  channel 
of  infinite  capacity). 
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Abstract  —  In  this  work  we  consider  the  problem  of 
determining  the  redundancy  of  successive  refinement 
codes  and  codes  with  side  information,  as  a  function 
of  their  blocklength.  It  is  shown  that  successive  re¬ 
finement  codes  accumulate  an  0(log  n/2n)  redundancy 
term  at  each  stage  of  the  encoding  process,  which 
may  result  in  a  considerable  degradation  of  the  final 
description.  Redundancy  result  for  codes  with  side 
information  is  also  presented. 

I.  Introduction 

The  redundancy  of  a  code  is  the  difference  between  the  av¬ 
erage  performance  of  the  code  and  the  theoretically  expected 
performance.  A  smaller  redundancy  is  achieved  as  the  block- 
length  increases.  However,  the  increase  of  the  block  length 
results  in  an  exponential  increase  in  coding  complexity.  There¬ 
fore,  it  is  interesting  to  study  the  tradeoff  between  the  redun¬ 
dancy  and  computational  complexity. 

In  a  lossy  source  coding  the  redundancy  of  a  code  at  rate 
R  is  the  difference  between  its  average  distortion  and  the 
distortion-rate  function.  Originally,  the  redundancy  problem 
in  the  lossy  source  coding  was  considered  by  Pile. However, 
only  a  few  years  ago,  the  problem  was  finally  solved  by 
Zhang  et  al.[ 2].  It  has  been  shown  that  in  the  coding  of 
a  discrete  memoryless  source  {A,  }(Y0,  X  ~  px,  the  dis¬ 
tortion  redundancy  of  a  code  with  a  fixed  blocklength  n ,  is 
\dRd(px ,  R)  \  In  n/2n+o  (In n/n),  where  dRd(px,R)  is  the  par¬ 
tial  derivative  evaluated  at  R  and  assumed  to  exist. 

II.  Main  Results 

The  Redundancy  of  Successive  Refinement  Codes 

Successive  refinement  is  a  coding  method  that  progressively 
improves  previously  obtained  descriptions  of  the  original  data 
using  additional  information.  The  problem  arises  in  a  variety 
of  applications,  where  coarse  representation  of  data  is  always 
transmitted,  and  occasionally,  finer  reproduction  is  required. 
Furthermore,  successive  refinement  scheme  is  also  a  technique 
for  fast  encoding  since  it  possesses  a  tree  structure.  A  Tree 
Structured  Vector  Quantizer  (TSVQ)  may  be  constructed  fol¬ 
lowing  the  successive  refinement  approach,  which  reduces  ex¬ 
ponentially  the  computational  complexity.  Moreover,  it  seems 
that  for  successively  refinable  source[l],  there  is  no  penalty 
due  to  the  multi-stage  encoding.  Nevertheless,  even  for  suc¬ 
cessively  non-refinable  sources  this  technique  is  still  computa¬ 
tionally  efficient.  Now  it  is  clear  that  the  investigation  of  the 
redundancy  aspect  is  crucial  for  a  performance  estimation  of 
the  successive  refinement  codes  as  well  as  for  analyzing  fast 
encoding  schemes  such  as  TSVQ. 


The  i-th  stage  redundancy  of  an  optimal  A-stage  successive 
refinement  code2  is  given  by  the  following  theorem. 

Theorem  1  ;  Let  Ri  >  0  be  the  rate  of  stage  i,  i  =  1 . . .  K. 
For  any  discrete  memoryless  source  X  ~  px  and  K  distortion 
levels  d\  >  . . .  >  dx  >  0,  the  distortion  redundancy  of  a  stage 
i,  associated  with  an  optimal  code  scheme,  is 

Vi(px,Ri,R2,  ■  ■  ■  ,Rk,ti) 


where  blocklength  n  is  sufficiently  large. 

The  Redundancy  of  Codes  with  Side  Information 
Another  closely  related  problem,  is  the  redundancy  prob¬ 
lem  of  codes  with  side  information.  It  arises  when  there  exists 
a  joint  source  (X,Y),  where  X  is  referred  to  as  the  source, 
while  Y  is  referred  to  as  the  side  information  and  available 
at  the  decoder.  The  decoder  reproduces  the  source  using  the 
knowledge  of  the  side  information.  The  redundancy  of  an 
optimal  code  with  side  information  is  given  by  the  following 
theorem. 


Theorem  2  :  Let  Ry  >  0  be  the  rate  of  the  code.  For  any 
joint  discrete  memoryless  source  [X,Y)  ~  pxY  and  a  distor¬ 
tion  level  dy  >  0,  the  distortion  redundancy  associated  with  an 
optimal  side  information  code  operating  at  rate  Ry  is 


^y(PXY,Ry,n) 


dRy 


dy(pXY,Ry) 


where  blocklength  n  is  sufficiently  large. 


III.  Conclusion 

An  interesting  consequence  of  our  result  is  that  any  mul¬ 
tistage  encoding  scheme  will  accumulate  the  redundancy  at 
each  stage,  which  may  lead  to  a  significant  increase  in  the 
overall  distortion.  An  important  example  of  this  phenomenon 
is  TSVQ.  Practical  implementations  of  TSVQ  show  that  it 
can  never  achieve  the  performance  of  a  block  code,  even  for 
successively  refinable  sources.  Until  this  work  there  was  no 
theoretical  understanding  of  this  fact. 
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Abstract  —  Given  an  achievable  quadruple 
(Ri D2)  for  successive  refinement  with  D2  <  D\, 
the  rate  loss  at  step  i  is  defined  as  Li  =  Ri  —R(Di).  It  is 
shown  that  for  a  memoryless  source  and  for  MSE,  an 
achievable  quadruple  can  be  found  such  that  Li  <1/2 
bit.  Moreover,  an  achievable  quadruple  can  be  found 
with  L2  arbitrarily  small  and  L\  <1/2  bit  if  D%  is  small 
enough.  If  an  information-efficient  description  at  D 1 
is  required  (i.e.  L\  =0),  then  there  exists  an  achiev¬ 
able  quadruple  with  L2  <  1  bit.  The  results  are  inde¬ 
pendent  of  both  the  source  and  the  particular  D\ ,  D2 
requirements  and  extend  to  any  difference  distortion 
measure.  The  techniques  employed  parallel  Zamir’s 
bounding  of  the  rate  loss  in  the  Wyner-Ziv  problem. 

I.  Introduction 

If  one  can  .design  two-step  compression  systems  that  incur  no 
rate  loss  relative  to  optimal  one-step  coding,  (i.e.  if  there 
is  an  achievable  quadruple  (Ri ,  Ri  +  AR,D\,D2)  such  that 
Ri  =  R(Di)  and  A R  =  R(D2)  -  R(Di)),  the  source  is  said 
to  be  successively  refinable  (SR)  [1].  In  [2]  Koshelev  intro¬ 
duced  the  notion  of  divisibility ,  and  argued  that  successive 
refinement  is  possible  if  there  exists  a  channel  Qu2,Ui\x  such 
that  i.)  the  random  variables  Ui  and  U2  defined  through  this 
channel  achieve  the  rate-distortion  function  of  X  at  distor¬ 
tions  D\  and  £>2,  respectively,  and  ii)  the  Markov  relation 
X  — >  U2  — >  Ui  is  satisfied.  Necessity  was  later  proved  by 
Equitz  and  Cover  [1],  who  also  used  the  Gerrish  problem  an 
example  of  a  non- SR  source  with  discrete  alphabet. 

Rimoldi  [3]  determined  the  achievable  region  for  two-step 
compression  of  a  discrete-alphabet  memoryless  source.  In  a 
subsequent  paper,  Effros  [4]  extended  Rimoldi’s  results  to  han¬ 
dle  stationary  sources  and  Polish  alphabets. 

Given  an  achievable  quadruple  (Ri,  R2,  Di,  D2),  we  define 
the  rate  loss  of  at  the  ith  stage  as 

L{  =  Ri  —  R(Di),  i  €  {1,  2} 

Let  D\  and  D2  be  fixed  (with  D2  <  Di).  For  a  succes¬ 
sively  refinable  source,  there  exists  an  achievable  quadruple 
for  which  Li  =  0  and  L2  =  0.  For  a  non-successively  refin¬ 
able  source,  it  is  not  possible  to  find  achievable  quadruples  for 
which  L\  and  L2  are  zero  simultaneously.  It  is  clear  that  it 
is  important  to  investigate  whether  L\  and  L2  can  be  made 
small  simultaneously  for  any  given  source.  Effros  [4]  computed 
for  the  Gerrish  problem  that  the  smallest  possible  rate  loss  L2 
when  forcing  L\  =  0  is  a  relatively  small  fraction  of  a  bit  for 
a  fixed  D 1  as  D2  varies. 

1This  work  was  partially  supported  by  a  CONACYT  (MEXICO) 
doctoral  grant  (110412/110461) 
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In  this  paper  we  provide  source-independent  bounds  for  the 
rate  loss  at  both  stages.  The  main  result  is  that  for  squared 
error  as  the  distortion  measure,  it  is  always  possible  to  find  an 
achievable  quadruple  for  which  the  rate  loss  satisfies  L\  <  1/2 
bit  and  L2  <  1/2  bit,  a  result  which  is  independent  of  the 
source  and  D\ ,  D2  requirements. 

II.  Summary  of  results 

We  assume  that  D\  and  D2  axe  some  fixed  distortion  require¬ 
ments  satisfying  D2  <  D\ .  We  will  also  assume  an  MSE 
distortion  measure. 

Theorem  1  There  exists  an  achievable  rate  pair  (R\,R2) 
with  Li=Ri-  R{Di)  <  1/2  bit,  i  S  {1, 2}. 

Theorem  2  Let  D\  be  fixed  and  let  D2  — t  0  as  k  — ► 
00.  There  exists  a  sequence  of  achievable  quadruples 
{(Ri,R2)}kLi  with  L\  <  1/2  bit  and  lim*  L2  =  0. 

Theorem  3  There  exists  an  achievable  rate  pair  (Ri,R2) 
with  Li  =  0  and  L2  <  1  bit. 

Theorem  4  Let  the  iid  source  {Xi}^-x  have  variance  ax. 
There  exists  an  achievable  rate  pair  (Ri,R2)  for  which 

Li  <  ^log2  ^2  -  bits,  ie{  1,2}. 

Theorem  5  Let  the  iid  source  {X}fLi  have  mean  px  and 
variance  a\  and  let  X*  ~  Af(px,  o2x).  There  exists  an  achiev¬ 
able  rate  pair  (Ri,R2)  with  Li  <  D(X\\X*),  i  —  1,2. 

Acknowledgments 

The  authors  wish  to  thank  Ram  Zamir  who,  after  seeing  our 
Theorem  1,  suggested  the  ideas  behind  Theorems  2  and  5.  We 
also  thank  Ying-On  Yan  for  his  careful  reading  of  a  draft  of 
this  manuscript. 

References 

[1]  W.H.  Equitz,  T.  Cover,  “Successive  Refinement  of  Informa¬ 
tion,”  IEEE  Trans.  Inform.  Theory ,  37(2),  pp.  269-275,  March 
1991. 

[2]  V.  Koshelev,  “Hierarchical  Coding  of  Discrete  Sources,”  ProM. 
Pered.  Inform.,  16(3),  pp.  31-49,  1980. 

[3]  B.  Rimoldi,  “Successive  Refinement  of  Information:  Character¬ 
ization  of  the  achievable  rates,"  IEEE  Trans.  Inform.  Theory, 
40(1),  pp.  253-259,  January  1994. 

[4]  M.  Effros,  “Distortion-rate  bounds  for  fixed-  and  variable-rate 
multiresolution  source  codes”,  IEEE  Trans.  Inform.  Theory, 
45(6),  pp.  1887-1910,  September  1999. 


0-7803-5857-0/00/$!  0.00  ©2000  IEEE. 


127 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


A  Broadcast  Approach  for  the  Multiple- Access  Slow  Fading  Channel 

Shlomo  Shamai  (Shitz) 

EE  Dept.,  Technion 

Haifa  32000,  Israel,  sshlomoQtechnion.ac.il 


Abstract  —  A  ‘single-user’  based  broadcast  approach 
is  adapted  for  the  multiple  access  very  slow  fading 
channel.  This  strategy  facilitates  to  adapt  the  reli¬ 
ably  conveyed  rate  to  the  actual  channel  conditions 
experienced  by  each  of  the  users  without  having  any 
feedback  links  to  the  transmitters.  This  strategy  im¬ 
plements  simultaneously  a  continuum  of  capacity  re¬ 
gions  vs.  outage  pairs  rather  than  a  single  value  as  is 
the  case  in  the  standard  approach.  We  address  specif¬ 
ically  expected  rates  and  outages,  which  are  compared 
to  ergodic  capacities  and  also  the  capacity  vs.  outage. 
The  main  results  presented  and  demonstrated  for  the 
two-user  independent  Rayleigh  faded  channel,  are  ex¬ 
tended  to  the  general  multiple  access  slowly  fading 
channel. 

I.  Model,  Assumptions  and  Preliminaries 

We  address  here  the  standard  Multiple  Access  Channel 
(MAC)  model  subjected  to  a  static  fading,  which  is  not  nec¬ 
essarily  independent  among  the  A-users.  Complex  notations 
are  used  throughout.  Here  yi,  the  received  signal  at  a  discrete 
time  instant-i,  i  —  1,  2  ...  IV,  equals  yi  =  ,  puxu  +  rii. 

The  i-th  coded  symbol  of  the  Z-th  user  is  designated  by  xu 
and  rii  stands  for  the  i-th  iid  additive  Gaussian  noise  sample 
with  variance  E\n\2  =  1.  The  fading  power  associated  with 
user  l  is  designated  by  su  —  \pu\2  ,  l  —  1, 2 , . . . ,  K,  and  is 
assumed  to  be  static  (su  =  si).  The  realizations  of  the  fading 
coefficients  {p^}  are  not  available  to  the  transmitters  or  the 
receiver,  which  are  aware  though  of  the  underlying  statisti¬ 
cal  law  only.  We  adhere  henceforth  to  the  single  block  fading 
channel  model  where  the  block  length,  N  K,  giving  rise  to 
equal  achievable  rates  for  channel  state  information  available 
or  not  at  the  receiver. 

In  parallel  to  the  single-user  case  [1],  the  capacity  vs. 
outage  for  a  K  user  system  is  associated  with  the  event  of 
(si,  S2 , . . . ,  sk)  satisfying  simultaneously  the  multiuser  equa¬ 
tion  set  for  achievable  rates  where  the  signal  to  noise  ratio 
reflects  also  the  interference  emerging  from  those  users  who 
do  not  belong  to  the  decodable  set  [2].  The  availability  prob¬ 
ability  is  associated  with  the  simultaneous  satisfaction  of  the 
equation  set  and  outages  are  associated  with  the  complimen¬ 
tary  event.  For  equal  rates  these  probabilities  for  a  two  user 
and  many  user  case  has  been  investigated  in  [2].  Clearly,  ex¬ 
pected  rates  are  naturally  associated  to  outage  probabilities. 

II.  The  Broadcast  Approach  Channel 

Single-User:  Assume  now  that  the  fading  power  random 
variable  s  is  continuous  and  let  R(s)  stand  for  the  reliably 
conveyed  information  rate  at  fading  level  s  which  designates  a 
certain  realization  of  the  fading  (power)  random  variable.  The 
transmitter  views  the  fading  channel  as  a  degraded  Gaussian 
broadcast  channel  with  a  continuum  of  receivers  each  expe¬ 
riencing  a  different  signal-to-noise  ratio  specified  by  s  ■  SNR. 


The  receiver  which  experiences  a  realization  s  is  able  to  de¬ 
code  its  own  data  stream  (indexed  by  s )  and  all  those  streams 
indexed  by  u  <  s  (intended  to  be  decoded  at  receivers  with 
lower  signal-to-noise  ratios  u SNR).  Within  this  framework  in 
[3]  the  achievable  rates,  expected  rates  and  outages  have  been 
studied  and  the  power  assignment  u(s)  has  been  optimized. 

Two-Users:  Let  to*  stand  for  the  effective  SNR  of  user 
k  =  1,2.  It  can  be  shown  that  {u>k}  is  given  by  the  solution 
of  the  equation  pair  on  =  w2  =  where 

yk{s)  =  Uk(u)du,  k  =  1,2.  We  express  ui\  =  u>i(si,  s2), 
oj2  =  o)2 (si,  s2)  as  explicit  functions  of  the  actual  fading  real¬ 
izations  (si,  s2)  for  specified  power  assignments  v\  (,s),  u2(s). 
The  simultaneously  achievable  rates  of  each  of  the  users 
Ri(si,  s2),  R2(s i,  s2)  respectively  depend  now  on  both  fad¬ 
ing  realizations  si  and  s2,  and  are  given  by  Rk(si,  s2)  = 
ju>k(ai,s2)  _  i  2.  The  expected  rates  are  now 

R* T  =  E(flt(*i,  52))  =  /“  (l  -  Fu»)  where 

Fuk  (u)  designates  the  probability  distribution  of  the  random 
variable  oi*(si,  s2).  Also  here  the  functions  yi(u),  y2{u)  can 
be  optimized  as  to  maximize  the  expected  rates,  or  the  total 
expected  throughput  R\t  +  R2t ■  In  parallel  to  the  single  user 
case,  also  here  expected  rates  per  outages  can  be  considered 
by  replacing  the  original  probability  distribution  of  the  fading 
powers  F,ltS2  (a,  /3)  by  F’° ,,2(a,  /3)  the  conditional  distribu¬ 
tion  function  of  si,  s2,  conditioned  on  the  event  si,  s2  so, 
where  the  associated  outage  probability  is  Prob  (si,  s2  €  so). 
A  natural  candidate  for  a  suboptimal  symmetric  power  distri¬ 
bution  for  the  two-user  case  is  a  modification  of  the  optimal 
single-user  distribution  found  in  [3]. 

K-Users:  extends  straightforwardly  to  the  general  A-user 
case.  Let  now  k  =  1,2,...,  A,  where  Uk(s)  stands  for  the 
power  distribution  of  the  fc-th  user  -  all  subjected  to  the  same 
average  power  constraint,  SNR.  The  strategy  induces  a  set  of 
A-nonlinear  equations  w *  =  s*,(l  +  ■ 

The  achievable  rates  associated  with  user  k  for  a  given 
realization  of  s  =  (si,  s2  , . . . ,  sk)  and  the  expected  rates 
are  still  given  by  the  former  single-user  based  equations  with 
v>k  =  Wk{s).  The  results  assume  a  compact  form  for  large 
systems,  A  1.  The  general  approach  does  not  demand,  in 
fact,  independence  among  the  fadings  affecting  all  the  users, 
making  the  current  approach  and  analysis  rather  general  and 
robust.  Some  interesting  examples  related  to  a  variety  of  in¬ 
terference  type  channels  are  explored. 
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Abstract  —  Consider  M  independent  users,  each 
user  having  his  own  transmit  antenna,  that  trans¬ 
mits  simultaneously  to  one  receiver  antenna  through 
a  Rayleigh  block-fading  channel  having  a  coherence 
interval  of  T  symbols,  with  no  channel  state  infor¬ 
mation  available  to  either  the  transmitters  or  to  the 
receiver.  The  total  transmitted  power  is  independent 
of  the  number  of  users.  For  a  given  coherence  time 
T,  we  wish  to  identify  the  best  multi-access  strategy 
that  maximizes  the  total  throughput,  where  all  users 
are  subjected  to  the  same  average  power  constraint. 

If  perfect  channel  state  information  were  available 
to  the  receiver,  it  is  known  that  the  total  capacity 
increases  monotonically  with  the  number  of  users.  If 
the  channel  state  information  is  available  to  both  the 
receiver  and  all  transmitters,  the  throughput  maxi¬ 
mizing  strategy  implies  that  only  a  single  user  that 
enjoys  the  best  channel  condition  transmits.  In  the 
absence  of  any  channel  state  information  one  is  forced 
to  a  radically  different  conclusion.  In  particular  we 
show  that  if  the  propagation  coefficients  take  on  new 
independent  values  for  every  symbol  (e.g.,  T  =  1)  then 
the  total  capacity  for  any  M  >  1  users  is  equal  to  the 
capacity  for  M  =  1  user,  in  which  case  TDMA  is  an 
optimal  scheme  for  handling  multiple  users.  This  re¬ 
sult  follows  directly  from  a  recent  treatment  of  the 
single-user  multiple  antenna  block-fading  channel. 

Again,  motivated  by  the  single-user  results,  one  is 
lead  to  the  following  conjecture  for  the  multiple  user 
case:  for  any  T  >  I  the  maximum  total  capacity  can 
be  achieved  by  no  more  than  M  =  T  users.  The  con¬ 
jecture  is  supported  by  establishing  the  asymptotic 
result  that,  for  a  constant  M/T  for  large  T,  the  total 
capacity  is  maximized  when  M/T  ->o,  which  yields  a 
total  capacity  per  symbol  of  log(l  +  p),  where  p  is  the 
expected  SNR  at  the  receiver. 

I.  Signal  Model 

We  use  a  block-fading  model  [1],  with  coherence  interval 
T,  where  M  independent  users  simultaneously  transmit  to  a 
single  receiver  antenna  in  a  flat-fading  environment,  where 
each  user  has  sole  access  to  one  of  M  transmit  antennas,  and 
where  nobody  has  any  CSI.  During  each  coherence  interval, 
the  M  users  collectively  transmit  a  T  x  M  complex  matrix  S, 
whose  columns  are  statistically  independent,  and  the  receiver 

1This  research  was  performed,  in  part,  while  the  author  was  vis¬ 
iting  the  Mathematical  Sciences  Research  Center,  Bell  Laboratories, 
Lucent  Technologies 


records  a  T  x  1  complex  vector  X, 

x  =  JJiSH+w'  (1) 

where  H  is  the  Mx  1  complex- valued  propagation  vector,  and 
W  is  a  T  x  1  vector  of  additive  receiver  noise.  All  components 
of  H  and  W  are  independent  Gaussian  CN(0, 1).  The  expected 
SNR  is  equal  to  p ,  subject  to  the  power  constraint, 

trE  {SS^}=TM.  (2) 

Our  goal  is  to  maximize  mutual  information  I(X\S),  with¬ 
out  any  CSI,  subject  to  1)  the  power  constraint  (2),  and  2) 
the  statistical  independence  of  the  columns  of  5. 

II.  Capacity  for  T  =  1;  no  CSI 

An  upper  bound  on  capacity  is  obtained  by  permitting  the 
columns  of  5  to  be  statistically  dependent.  This  leads  directly 
[2]  to  the  conclusion  that,  when  T  =  1,  the  capacity  for  M  >  1 
users  is  equal  to  the  capacity  for  M  =  1  user.  In  contrast,  if 
perfect  CSI  were  available  to  the  receiver,  the  total  M- user 
capacity  would  be  equal  to  the  single-user/ M-antenna  capac¬ 
ity  [3],  and  in  case  CSI  is  available  also  to  the  transmitters 
the  channel  controled  TDMA  is  optimal  [4]. 

III.  Conjecture  for  T  >  1;  no  CSI 

For  the  general  case  T  >  1,  a  conjecture  is  that  the  to¬ 
tal  capacity  for  any  M  >  T  is  equal  to  the  total  capacity  for 
M  <  T.  At  present  we  are  unable  to  prove  this  conjecture,  but 
we  make  some  headway  by  studying  the  case  where  T  and  M 
grow  big.  In  this  case,  with  M/T  -A  0,  the  asymptotic  mutual 
information  is  Tlog(l  +  p),  which  is  equal  [3]  to  the  capac¬ 
ity  where  a  single  user  has  access  to  an  unlimited  number  of 
transmit  antennas,  with  perfect  CSI  available  to  the  receiver. 
This  result  strongly  support  our  conjecture. 
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Abstract  —  We  consider  a  Direct  Sequence  Code 
Division  Multiple  Access  (DS-CDMA)  channel  in  col¬ 
ored  additive  Gaussian  noise  and  focus  on  the  sum 
capacity  of  this  channel.  Sum  Capacity  is  the  maxi¬ 
mum  sum  of  rates  at  which  users  can  jointly  reliably 
transmit,  in  an  information  theoretic  sense.  We  com¬ 
pletely  characterize  optimum  sum  capacity,  which  is 
obtained  by  choosing  the  signature  sequences  of  the 
users  appropriately.  Our  characterization  is  construc¬ 
tive  in  that  we  provide  a  combinatorial  algorithm  to 
generate  the  optimum  signature  sequences  as  a  func¬ 
tion  of  the  covariance  of  the  additive  background  noise 
and  power  constraints  of  the  users.  The  characteri¬ 
zation  also  allows  us  to  identify  a  saddle  property  of 
the  optimum  sum  capacity:  convexity  in  the  covari¬ 
ance  matrix  of  the  additive  noise  and  concavity  in  the 
vector  of  user  power  constraints. 

I.  Introduction  and  Problem  Statement 
A  discrete  time  baseband  no  fading  DS-CDMA  channel  (with 
short  signature  sequences)  is  the  following: 

K 

y(ra)  =  ^xi(n)si(n)  +  w(n)  . 

i=l 

Here  K  denotes  the  number  of  users  and  n  the  channel  use 
instant.  The  user  symbols  are  denoted  by  x,  and  y (n)  is 
the  signal  (thought  of  as  a  A  dimensional  vector,  N  being 
the  processing  gain  or  number  of  chips  per  symbol)  at  the 
receiver  at  time  instant  n.  Here  w(n)  is  an  additive  Gaussian 
noise  with  covariance  matrix  E.  Each  user  t  is  subject  to  a 
time  averaged  power  constraint  of  p; .  We  denote  D  to  be  the 
diagonal  matrix  of  the  user  power  constraints. 

Our  focus  will  be  on  sum  capacity:  sum  of  rates  at  which 
users  jointly  reliably  communicate.  These  rates  are  time  av¬ 
eraged  with  the  power  constraint  on  the  users  also  averaged 
in  time.  A  generalization  of  the  results  in  [2]  to  the  colored 
noise  case  allows  us  to  write  the  following  expression  for  sum 
capacity  of  the  DS-CDMA  channel  with  signature  sequences 
S  d=  [si . . .  s/c]. 

CSUm(5,D,E)=  ilogdet(/  +  E-15D5t)  . 

Our  main  focus  in  this  paper  is  to  characterize  the  maximum 
sum  capacity: 

Copt  {D,  E)  d=  max  Csum  (S,  D,  E) 

where  5  is  the  set  of  all  N  x  K  real  matrices  with  all  columns 
having  h  norm  equal  to  1.  Observe  that  C3um  is  a  continuous 
function  defined  on  a  compact  set  S  and  thus  the  use  of  max 
in  above  is  justified. 

1This  work  was  supported  by  NSF  under  grant  IRI  97-12131. 


II.  Main  Results 

Our  main  result  is  a  complete  characterization  of  Copt  as  a 
function  of  D  and  E.  This  characterization  is  constructive  in 
the  sense  that  we  develop  a  comiu'naforta/ algorithm  to  gener¬ 
ate  the  optimum  signature  sequences  (these  achieve  the  maxi¬ 
mum  sum  capacity).  The  details  of  this  result  are  available  in 
[3].  In  this  summary,  we  briefly  describe  a  qualitative  prop¬ 
erty  of  the  optimum  sum  capacity  that  emerges  out  of  our 
characterization.  Our  first  result  is  a  saddle  property  of  the 
optimum  sum  capacity: 

Theorem  1  For  every  fixed  E,  C0pt  ( D ,  E)  is  a  concave  func¬ 
tion  in  D  and  a  convex  function  in  E  for  every  fixed  D. 

We  can  strengthen  this  result  using  the  partial  order  of  Schur 
majorization  on  vectors  in  ]RN.  We  say  that  a  vector  a  ma¬ 
jorizes  another  vector  b  if  their  components  have  the  same  sum 
and  the  components  of  a  are  “more  spread  out”  than  those  of 
b.  For  example,  every  vector  in  ]RN  with  sum  N  majorizes 
the  vector  with  all  components  unity.  An  exhaustive  resource 
for  results  on  this  partial  order  is  [1],  We  show  that  the  opti¬ 
mum  sum  capacity  is  a  Schur-saddle  function  in  the  following 
sense.  Below  we  have  denoted  the  vector  of  eigenvalues  of  E 
by  (<t2,...,o2n). 

Theorem  2  1.  For  every  fixed  D,  Copt  (D,Z)  > 

Copt  (D,  E)  for  every  E  ^  E  such  that  (a2 , . . . , 
majorizes  (erf , . . . ,  <t2n ) . 

2.  For  every  fixed  E  and  for  every  D  D  such 

that  (pi,...,pk)  majorizes  (pi,...,pn)  we  have 
Copt  (D,  E)  >  Copt  (D,  E). 
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Abstract  —  The  problem  of  maximizing  a  weighted 
linear  combination  of  the  rates  of  users  in  a  multiuser 
synchronous  CDMA  system  is  considered.  We  find 
that  although  spreading  decreases  capacity,  nontrivial 
low  rate  coding  can  help  to  mitigate  this  loss. 

I.  Introduction  and  Motivation 

Massey  [1]  proposed  a  novel  definition  of  a  spread  spectrum 
system  as  one  in  which  the  Fourier  bandwidth  W,  defined  as 
the  “support”  of  the  Fourier  transform,  is  much  greater  than 
the  Shannon  bandwidth  B,  defined  as  one-half  the  number  of 
dimensions  of  signal  space  used  per  second. 

Following  [1],  which  dealt  with  single  user  communica¬ 
tion,  we  study  multiuser  CDMA  communication  systems  from 
an  information  theoretic  perspective.  The  sum  capacity  was 
studied  and  characterized  in  [2,  3].  In  this  paper,  we  consider 
the  problem  of  maximizing  an  arbitrary  linear  combination  of 
the  users’  rates  over  multiuser  capacity  regions. 

II.  Multiuser  Capacity  Regions 

We  assume  a  A-user,  additive  white  Gaussian  noise 
(AWGN)  channel  with  usable  bandwidth  W,  noise  PSD  wp- , 
and  average  power  constraint  Pi  for  the  ith  user. 

The  capacity  region,  i.e.,  the  set  of  rates  at  which  reli¬ 
able  communication  is  possible,  for  unconstrained  signaling 
(no  spreading)  is  well  known  [4]  and  defined  by  the  constraints: 

0  <  Ri  <  W  log2  (l  +  J2  bits/sec>  (!) 

where  J  is  a  nonempty  subset  of  {1,  •  •  •  ,  A}.  We  will  denote 
this  capacity  region  with  no  spreading  as  Cns- 

The  capacity  region  for  symbol  synchronous  CDMA  with 
spreading  factor  N  =  ^  is  defined  by  the  constraints  [5]: 

0<X><Slog  (det  I|j|  +  ^  bits/sec,  (2) 

where  |  J|  is  the  cardinality  of  J,  Ik  is  a  k  x  k  identity  matrix, 
and  Rj  and  Pj  are  the  matrix  of  normalized  cross  correlations 
and  the  diagonal  matrix  of  received  powers  (Pi)  respectively 
of  the  users  in  J.  Since  the  capacity  region  for  direct-sequence 
CDMA  depends  upon  the  cross-correlations  between  the  users’ 
spreading  sequences,  we  will  denote  it  as  Cds(R)- 

We  also  consider  “naive”  CDMA,  in  which  all  users  are 
assigned  identical  spreading  sequences.  Defining  Ik  to  be  the 
K  x  K  matrix  of  all  ones,  we  note  that  C„aive  =  Cds(R  =  Ik)- 

A  common  performance  metric  is  the  sum  capacity  [2], 
which  is  the  maximum  value  of  the  sum  of  all  users’  rates.  The 
general  problem  of  maximizing  an  arbitrary  linear  combina¬ 
tion  of  the  users’  rates  is  considered  by  defining  the  capacity 
metric  function:  M(X  —  [Ai,  ■  ■  •  ,  Ak])  =  A[Ri,  ■  ■  •  ,  Rk]T ■ 

1This  research  is  supported  in  part  by  NSF  Grants  CCR-9805885 
and  CCR-9733204  and  in  part  by  the  Intel  Foundation  Fellowship. 


III.  Results  and  Discussion 

1.  The  capacity  regions  are  nested  as  follows: 

Cnaive  C  Cds(R)  C  Cns  (3) 

This  immediately  gives  us,  for  any  A, 

max  M(A)  <  max  M( A)  <  maxM(A)  (4) 

t-naive  l-ns 

2.  That  spreading  decreases  capacity,  suitably  defined  here  as 
the  maximum  of  a  linear  combination  of  the  users’  rates,  is  not 
surprising.  The  surprising  result,  also  noticed  in  [1],  is  that 
spreading  need  not  decrease  capacity  substantially.  Consider 
the  sum  capacity,  i.e.,  set  A  =  [1,  •  •  •  ,  1].  Letting  the  Shan¬ 
non  bandwidth  of  the  sum  of  the  K  users’  modulated  signals 
satisfy  B  =  a^,  where  a  >  0  and  P  is  the  average  power 
received  from  all  users,  a  simple  argument  shows 


Cds(R)  _  maxCds(H){M([l,'  ••  ,1])} 
Cns  maxcn„{M([l,  •  •  -  ,1])} 


>  — - — . 
~  a  +  K 


3.  A  similar  argument  can  be  used  to  show  that  an  arbitrary 
linear  combination  of  the  user’s  rates  can  be  made  close  to  the 
maximum  achievable. 

maxCde(R){M(A)}  a  ,- 

maxcns{M(A)}  ~  a  +  K' 

We  make  no  assumptions  on  the  spreading  sequences  or 
the  received  powers  of  the  users.  Our  results  indicate  that  if 
the  Shannon  bandwidth  is  large  enough,  spreading  does  not 
entail  a  substantial  loss  in  capacity.  One  way  to  increase  the 
Shannon  bandwidth  is  to  use  nontrivial  low  rate  coding  [1]. 
However,  (5)  and  (6)  indicate  that  coding  provides  diminish¬ 
ing  returns,  i.e,  as  the  code  rate  decreases,  the  amount  of  im¬ 
provement  decreases.  The  implication  to  the  coding-spreading 
tradeoff  is  that  one  should  code  to  the  point  of  diminishing 
returns  (say  80-90%  of  capacity)  and  use  the  remaining  band¬ 
width  expansion  for  spreading. 
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Abstract  —  Acquisition  is  a  very  important  step  in 
DS/SS  communications  systems.  In  this  paper,  we 
describe  several  suboptimal  schemes  for  parallel  non¬ 
coherent  acquisition.  Simulation  results  and  perfor¬ 
mance  analysis  are  also  summarized. 

I.  Introduction 

In  direct  sequence  spread-spectrum  (DS/SS)  communications 
systems,  the  transmitter’s  signature  sequence  and  the  re¬ 
ceiver’s  replica  of  this  sequence  must  be  synchronized  in  order 
to  provide  enough  signal  energy  for  reliable  data  demodula¬ 
tion.  The  synchronization  has  two  stages.  In  the  first  stage, 
often  referred  to  as  coarse  acquisition,  the  receiver’s  sequence 
is  synchronized  to  within  some  fraction  of  the  chip  duration 
with  the  transmitter’s  sequence.  In  the  second  stage,  the  re¬ 
ceiver  accomplishes  and  maintains  fine  alignment  of  the  se¬ 
quences  by  using  a  code  tracking  loop.  In  this  paper,  we  con¬ 
sider  only  the  coarse  acquisition  process.  Our  goal  is  to  find 
effective  acquisition  schemes  which  are  also  easy  to  implement. 

II.  Estimation  of  Delay 

In  noncoherent  parallel  acquisition  schemes,  the  receiver 
first  computes,  in  parallel,  the  correlation  of  the  received  sig¬ 
nal  with  the  locally  generated  in-phase  and  quadrature  RF 
carrier  for  each  of  the  phases  of  the  PN  sequence.  Next,  the  N 
complex  observations  Z(i),  where  i  —  0, 1, . . . ,  N—  1,  are  used 
to  estimate  the  unknown  delay  between  the  local  sequence  and 
the  sequence  in  the  received  signal. 

Optimal  Estimator  The  optimal  estimation  scheme  [1] 
[2]  minimizes  Pe ,  the  probability  that  the  estimate  of  the  true 
delay  differs  from  the  true  delay  by  more  than  half  a  chip 
interval.  Sopt  as  given  in  [2]  is  very  intensive  computationally 
and  its  performance  is  difficult  to  evaluate  analytically. 

Suboptimal  Estimators  Srinivasan  and  Sarwate  [3]  have 
considered  suboptimal  estimators  in  which  the  delay  8  =  k  +  e 
(where  k  =  |_<5J )  is  estimated  in  two  steps.  First,  k  is  estimated 

as  kest  —  argmax;€{0,i . n- i}  l-^il  and  then  e  is  estimated  in 

the  same  manner  as  in  Sopt  or  the  coherent  version  of  Sopt  [1]  • 
These  schemes  perform  nearly  as  well  as  the  optimal  estimator 
scheme  but  analytical  evaluation  of  performances  is  difficult. 

We  have  studied  some  two-stage  suboptimal  schemes  that 
estimate  k  as 

arg1£{0  ]nax^^(|Zj|2  + 

cf.  [2],  and  e  from  the  ratio  \Zke3t  \/\Zk3St+\ |.  In  particular, 
Srt3  uses 

\Zkc,t+l\" 

\Zkc„+\\2  +  \Zk'St\2 


as  the  estimate  of  e.  The  computational  costs  of  these  schemes 
are  much  smaller  than  the  optimal  scheme.  Moreover,  because 
of  the  simplicity  of  the  decision  statistics,  analytical  results 
can  be  obtained. 

III.  Performance  Analysis 
We  have  proved  that  Pe  for  all  the  suboptimal  schemes  is 
bounded  above  by  a  function  that  decreases  exponentially 
with  increasing  SNR.  This  implies  that  Pe,opt,  the  error  prob¬ 
ability  for  Sopt  is  also  an  exponentially  decreasing  function  of 
SNR. 

We  have  studied  the  performances  of  the  suboptimal 
schemes  by  simulation.  The  figure  below  compares  the  error 
probability  performance  of  the  optimal  scheme  and  the  four 
suboptimal  schemes.  For  e  —  0.25,  two  of  the  schemes  have 
performance  close  to  optimal.  For  other  values  of  e,  other 
schemes  are  close  to  optimal.  However,  in  all  cases,  Srt3  is 
always  close  to  the  optimal. 


Figure  1:  Pe  for  Sopt  and  four  suboptimal  schemes. 
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Abstract  —  This  paper  analyzes  the  acquisition 
scheme  of  WCDMA,  a  standard  for  the  next  gener¬ 
ation  wireless  systems,  and  characterizes  its  perfor¬ 
mance  under  various  channel  conditions. 

I.  Introduction 

WCDMA,  a  standard  for  3G  wireless  systems,  uses  a  three 
step  search  to  acquire  the  asynchronous  forward  link.  First, 
the  Primary  Synchronization  Code  (PSC)  is  used  to  detect  the 
scramble  code  mask  timing  of  the  best  cell  site  (slot  timing) 
using  the  proper  matched  filter.  Next,  the  Secondary  Syn¬ 
chronization  Code  (SSC)  is  used  to  identify  the  scramble  code 
group  by  cross-correlation  of  the  received  signal  with  all  the 
Group  Index  (GI)  code  candidates  used  in  the  system.  The 
frame  timing  is  also  given  by  the  use  of  comma  free  codes. 
The  final  stage  is  the  detection  of  the  pilot  by  identifying  the 
scramble  code  belonging  to  the  group  specified  by  the  SSC. 

II.  Pilot  Detection  Technique 

The  PSC  and  SSC  are  multiplexed,  and  are  orthogonal 
to  each  other,  but  not  to  the  other  forward  link  channels. 
The  PSC  consists  of  a  length  256  sequence  having  good  ape¬ 
riodic  correlation  properties.  The  searcher  coherently  inte¬ 
grates  the  received  waveform  over  a  256  chip  duration,  and 
non-coherently  integrates  a  number  of  them.  Then,  it  picks 
the  maximum  as  the  required  estimate.  Note  that  as  the  sym¬ 
bols  transmitted  with  relatively  low  power,  the  signal  needs 
to  be  accumulated  over  multiple  frames  to  provide  enough  en¬ 
ergy  to  successfully  demodulate  it.  The  SSC  is  essentially  a 
two  layer  code.  The  outer  code  provides  the  frame  synchro¬ 
nization  information.  The  inner  code  provides  information  on 
the  GI  of  the  pilot. 

The  SSC  consists  of  Hadamard  sequences  chosen  appropri¬ 
ately  and  XOR-ed  with  the  PSC.  The  frame  timing  is  obtained 
by  using  a  comma-free  code  on  top  of  it,  i.e.  a  sequence  of 
short  codes  (SC’s)  is  transmitted.  These  SC’s  are  unmodu¬ 
lated,  of  length  16,  and  are  Comma  Free,  i.e.  all  their  cyclic 
shifts  are  unique.  Thus  the  received  cyclic  shift  of  this  se¬ 
quence  provides  information  about  frame  timing.  The  Comma 
Free  code  words  are  constructed  from  Reed-Solomon  codes. 
A  (16,3)  Comma  Free  Code  has  more  than  300  possible  code 
words  out  of  which  only  32  are  used,  i.e.,  the  process  is  scal¬ 
able.  Based  on  the  GI,  one  of  32  possible  masks  need  be  iden¬ 
tified  for  pilot  acquisition.  The  pilot  symbols  are  integrated 
for  1024  chips,  and  can  be  analyzed  in  a  manner  similar  to 
that  described  in  [1].  During  initial  acquisition,  the  MS  first 
demodulates  the  PSC,  then  the  SSC,  including  the  inner  and 
outer  code.  Finally,  it  demodulates  the  pilot  to  obtain  syn¬ 
chronization.  If  it  is  unable  to  find  any  pilot  in  the  given  GI, 
it  starts  the  process  all  over  again  till  it  succeeds. 

III.  Overall  System  Performance 

Since  all  three  stages  are  always  carried  out  in  this  algo¬ 
rithm  before  going  back  to  the  first  stage,  the  average  total 
search  time  can  be  expressed  as:  Ts  =  yzjt,  where  Tc  is 


amount  of  time  it  takes  to  go, through  the  three  stages,  and 
Pe  is  probability  that  one  iteration  of  the  three-step  search 
misidentifies  the  spreading  code  number  and  timing.  Adding 
the  three  stages,  use  Tc  =  (30  +  20  +  10)  ms  =  60  ms.  Pe,  the 
misdetection  probability  for  each  search  iteration  is  given  by: 

Pe  =  Pf  +  (Pf  X  Pf)  +  (Pdp  X  Pi  xPf), 

where  Pf  denotes  the  false  alarm  probability  in  detecting  the 
PSC,  Pp  —  1  —  Pf  denotes  the  probability  of  correct  detec¬ 
tion  of  the  PSC,  Pf  denotes  the  false  alarm  probability  in 
detecting  the  SSC,  Pf  =  1  -  Pf  denotes  the  probability  of 
correct  detection  of  the  SSC,  and  Pj  denotes  the  probability 
of  false  alarm  in  detection  of  the  pilot.  Fig.  1  shows  the  search 
performance  for  Rayleigh  fading  channels: 


WCDMA  Acquisition  In  Rayleigh  Fading  Channels 


Tlmo  In  nr 


Figure  1:  Acquisition  Time  in  WCDMA 


IV.  Conclusion 

In  practice,  the  signal  may  be  received  at  -21dB  at  the 
cell  boundary  for  a  Rayleigh  fading  channel.  Then,  it  can 
be  seen  that  for  a  90%  reliability,  the  acquisition  time  in  a 
practical  situation  is  close  to  500  ms.  This  is  in  accordance  , 
with  the  simulations  shown  in  [2].  The  practical  deployment 
scenario  governs  the  requisite  power  needed  for  good  system 
performance.  Details  of  the  analysis  are  omitted  here. 
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Abstract 

In  this  communication  we  deal  with  symbol  synchronization  in  channels  with  data  dependent  noise.  Examples  of  such 
channels  arise  in  optical  communications  using  APD  receivers,  or  direct  detection  of  optically  amplified  signals  [1]  where  the 
noise  power  is  higher  when  a  logical  one  is  received.  Another  situation  of  data  dependent  noise  arises  in  the  detection  of 
signals  in  the  presence  of  clutter  [2,3],  In  these  systems  the  disturbance  exhibits  cyclostationary  statistics  that  are  ignored  by 
the  conventional  synchronizers  designed  for  the  additive  Gaussian  noise  (AGN)  channel,  although  this  cyclostationarity 
contains  timing  information  that  can  be  explored  to  improve  the  tracking  performance  of  the  symbol  synchronizer. 

We  consider  a  channel  model  where  the  additive  disturbance  corrupting  the  received  signal  consists  in  the  sum  of  an 
AWGN  process  and  a  cyclostationary  component  modeled  by  a  Gaussian  process  with  power  proportional  to  the  data  symbol 
transmitted.  In  spite  ot  its  simplicity  this  model  represents  a  good  approximation  for  many  direct  detection  optical  systems 
with  APD’s  or  in  line  optical  amplifiers.  The  maximum-likelihood  (ML)  data  aided  (DA)  symbol  synchronizer  for  this  class 
of  channels  is  derived  and  its  performance  assessed.  Comparing  the  new  synchronizer  against  the  well  known  MLDA 
synchronizer  designed  for  the  AWGN  channel,  shows  that  basically  the  new  structure  includes  in  addition  to  the  operations 
performed  by  the  AWGN-MLDA  synchronizer,  processing  that  explores  the  cyclostationary  characteristics  of  the  additive 
disturbance  to  enhance  the  accuracy  of  the  time-delay  estimation. 

The  tracking  performance  is  derived  assuming  that  the  synchronizer  is  designed  to  operate  with  a  small  output  jitter,  and 
consequently  its  behavior  can  be  linearized.  It  is  shown  that  the  timing  estimates  produced  by  the  maximum  likelihood 
synchronizer  are  unbiased  provided  the  elementary  data  pulses  exhibit  time  symmetry  around  their  center,  and  consequently 
in  such  cases  the  linearized  performance  achieves  the  Cramcr-Rao  bound.  The  performance  of  the  new  synchronizer  is 
compared  against  the  one  that  would  be  achieved  with  the  AWGN-MLDA  structure  when  used  in  channels  with  data- 
dependent  noise.  The  results  show  that  non-negligible  (up  to  6dB)  improvements  are  achieved  with  the  new  structure  in 
situations  where  there  is  a  considerable  asymmetry  between  the  noise  powers  corresponding  to  a  one  or  zero.  The  structure  is 
thus  of  interest  for  APD  based  optical  communications. 
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Abstract  —  Pulse  Position  Hopping  (PPH)  is  a  new  where  E(k)  is  the  energy  of  the  received  signal  from  the  kth 
promising  multiple  access  technique  which  has  sev-  user,  5^  is  the  time  offset,  n(t)  is  the  additive  white  Gaussian 
eral  benefits,  such  as  coherent  reception,  low  trans-  noise.  Assuming  that  the  system  has  perfect  power  control, 
mit  power  and  it  can  be  constructed  to  be  near-far  i.e. ,  =  E,  k  =  1  and  perfect  synchronization, 

robust.  Analysis  [1,  2]  shows  that,  it  can  reach  an  the  nth  output  of  the  correlation  receiver  and  input  to  the 
order  of  several  thousands  of  active  users  per  cell.  In  decoder,  for  the  fcth  user,  is 
this  paper  we  have  estimated  the  effective  capacity 

for  the  uplink  and  the  downlink  communication  in  a  z ^  =  /  r(t)h(t  —  nTf  —  a^A)dt 

PPH  spread  spectrum  system. 

=  ,(«+5]r,)+n„ 

I.  Introduction  w±k 


We  consider  a  PPH-CDMA  system  with  K  users.  Let 
=  (uik),u[k), . . .  u^h),  «,(fc)  €  {0, 1}  and  k  =  1, 2, ... ,  K, 
be  the  information  sequence  of  the  Arth  user  and  v<k>  = 
v„  e  {0,1}  be  the  corresponding  code 
sequence.  The  code  rate  is  then  r  =  L/N.  The  transmis¬ 
sion  of  the  code  symbols  is  divided  into  frames  of  length  Tf. 
Each  active  user  transmits  one  code  symbol  in  each  frame. 
The  fcth  user’s  information  rate  is  then  =  r/Tf  (bit/s), 
k  =  1, 2, . . . ,  K,  independent  of  the  user.  The  frame  time  is  di¬ 
vided  in  Q  slots  of  length  A,  Tf  =  QA,  and  a  pseudo-random 
“hopping” -sequence  aik)  6  {0, 1, . . . ,  Q  —  1},  n  =  0, . . . ,  N  —  1, 
provides  a  time  shift  within  the  nth  frame. 


II.  PPH-CDMA 


We  have  analyzed  the  transmission  by  rectangular  pulses  of 
duration  Tc  with  unit  energy  and  Gaussian  shaped  pulses, 


such  that 


where  Tc  determines  the  width  of  the  pulse.  The  parameter  7 
is  chosen  such  that  about  99%  of  the  Gaussian  pulse  energy 
is  located  in  the  interval  (~^S  )■  The  fcth  transmitters’ 

output  signal  is 


sW(t)  =  vnk)h(t  -  nTf  -  a£fc)  A), 

n=0 


and  the  received  signal  is 


where  lik,k  '  is  the  interference  from  the  transmission  of  the 
fc'th  user  and  n„  is  the  background  noise.  As  Q  >  1  we  neglect 
the  interferences  from  pulses  transmitted  in  other  frames  and 
by  the  assumption  that  the  system  is  interference  limited  we 
neglect  the  background  noise.  We  estimate  the  interference 
between  the  users  and  model  it  as  white  Gaussian  noise,  which 
is  reasonable  as  there  is  several  thousands  of  users  in  each  cell. 

Given  the  parameters  no  =  E[z^\v^  =  0],  pi  = 
E[z^ =  1],  and  cr2,  =  var[z£fc)]  the  effective  signal-to- 
interference  ratio  (SIR)  per  time  unit,  tj ,  is  defined  as 

„*/  1  (^)2 

V  Tf  2 a2  ' 

In  [2]  we  have  shown  that  the  overall  effective  system  capac¬ 
ity  (in  bits/s),  C ,  can  be  lower  bounded  by  C  >  j^§.  For 
Gaussian  and  rectangular  pulses  we  get 

C>  vagina  (Gaussian) 

C  >  8Te3,n2  (rectangular.) 

III.  Numerical  evaluation 

We  have  investigated  the  performance  of  a  PPH-CDMA  sys¬ 
tem  transmitting  Gaussian  pulses  and  employing  a  concate¬ 
nated  code  with  an  inner  first  order  Reed-Muller  code  and 
an  outer  rate  convolutional  code.  Simulation  of  this  system 
indicates  that  it  can  host  more  than  30  000  active  users  trans¬ 
mitting  at  a  bit  rate  of  10  kbit/s  if  you  choose  Tc  to  be  1  ns. 


k  _ 

r(t)  =  Y l  VEWsik)h(t  -  S(k))  +  n(t), 
k= 1 

1This  work  was  supported  in  part  by  the  Foundation  for  Strate¬ 
gic  Research  -  Personal  Computing  and  Communication(PCC)  and 
Ericsson  Mobile  Communications. 

2  This  work  was  supported  in  part  by  the  Royal  Swedish 
Academy  of  Science  in  cooperation  with  the  Russian  Academy  of 
Science. 


References 

[1]  M.  Z.  Win,  R.  A.  Scholtz,  Impulse  Radio:  How  it  works,  IEEE 
Communications  Letters,  vol  2,  No  1,  1998 

[2]  O.  Wintzell,  D.  K.  Zigangirov,  K.  Sh.  Zigangirov,  On  the  Ca¬ 
pacity  of  a  Pulse  Position  Hopped  CDMA  System,  submitted 
for  publication  in  IEEE  Transactions  of  Information  Theory 


I 


2 


0-7803-5857-0/00/$  10.00  ©2000  IEEE. 


135 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Differential  Phase  Shift  Keying  with  Constellation  Expansion  Diversity 

Lutz  H.-J.  Lampe,  Robert  F.H.  Fischer,  Johannes  B.  Huber 
Lehrstuhl  fur  Nachrichtentechnik  II,  Universitat  Erlangen-Niirnberg,  Cauerstr.  7/NT,  D-91058  Erlangen,  Germany 


Abstract  —  A  new  differential  encoding  strategy  is 
introduced,  which  is  shown  to  be  advantageous  for 
bandwidth  efficient  transmission  over  flat  Rician  fad¬ 
ing  channels  when  using  multiple  symbol  differential 
detection. 

I.  System  Model  and  Differential  Encoding 

Consider  a  stationary,  slowly  time-varying,  frequency  non- 
selective  (flat)  Rician  fading  channel.  Channel  state  and  car¬ 
rier  phase  offset  are  expected  to  be  constant  over  a  block  of 
at  least  N  consecutive  symbols,  but  not  known  at  the  re¬ 
ceiver.  For  such  situations,  differential  phase  encoding  at  the 
transmitter  and  noncoherent  demodulation  at  the  receiver  are 
appropriate.  To  achieve  higher  spectral  efficiencies  APSK  con¬ 
stellations  are  attractive,  which  points  are  arranged  in  a  dis¬ 
tinct  concentric  rings  with  radii  i\  and  /?  uniformly  spaced 
phases. 

Because  the  received  signal  amplitude  still  provides  infor¬ 
mation  on  the  transmitted  amplitude,  information  should  be 
carried  in  the  actual  amplitude.  But  then,  due  to  fading, 
part  of  the  information  carried  in  the  amplitude  will  be  lost. 
One  possible  approach  to  overcome  this  drawback  and  to  ex¬ 
ploit  the  potential  of  amplitude  modulation  is  to  completely 
map  the  information  onto  phase  changes,  and  additionally,  to 
(partly)  map  the  same  information  onto  the  amplitude  of  the 
transmit  symbols.  This  redundant  mapping  introduces  diver¬ 
sity. 

The  most  promising  arrangement  for  the  signal  points  is 

A  =  {c  =  rmmo dQeJ^m  \  m  =  0, . . .  ,afi  —  l}  ,  (1) 

because  points  whose  phases  differ  by  the  minimum  value 
have  different  amplitudes. 

Given  the  data-carrying  differential  symbol  a  =  rje J°Fm  £ 
A  and  the  state  s  =  ri^"  of  the  differential  encoder,  the 
current  transmit  symbol  rEA'is  calculated  according  to 

/£  —  rp  j  71  "Cm )  mod  a0  ^2) 

The  transmit  signal  constellation  X  now  consists  of  again  a 
amplitudes  but  aft  phases.  Due  to  the  redundant  mapping,  X 
is  expanded  and  the  set  A  is  a  proper  subset  of  X.  For  a  —  4, 
P  =  4  the  constellations  A  and  X  are  shown  in  Figure  1. 


Fig.  1:  Signal  constellations  A  (left)  according  to  (1)  and  X  (right) 
for  o  =  4,  P  —  4  (geometric  ring  spacing). 


For  slow  fading  channels  we  apply  multiple  symbol  differ¬ 
ential  detection  [1],  where  the  receiver  processes  blocks  of  N 
consecutive  receive  symbols.  Due  to  (ideal)  interleaving  at  the 
transmitter  and  deinterleaving  at  the  receiver  of  vector  sym¬ 
bols  a  (virtually)  memoryless  block  fading  channel  is  obtained. 

II.  Numerical  Results 

For  the  AWGN  channel  and  the  Rayleigh  fading  channel  the 
achievable  capacity  is  numerically  evaluated  as  a  function  of 
the  (average)  signal-to-noise  ratio  Es/No  ( Es :  average  en¬ 
ergy  per  received  symbol,  Nq:  one-sided  noise  power  spectral 
density).  As  shown  in  [2],  it  is  sufficient  to  fix  the  differen¬ 
tial  symbols  to  be  uniformly,  independently  and  identically 
distributed,  and  to  solely  optimize  the  ring  ratio. 

Figure  2  shows  the  capacities  of  16-ary  modulation  schemes 
using  two  signaling  amplitudes  and  multiple  symbol  differen¬ 
tial  detection  of  N  =  3.  Clearly,  for  the  AWGN  channel, 
where  the  amplitude  transmit  factor  is  constant,  differential 
encoding  of  the  amplitude  is  not  rewarding.  In  case  of  fad¬ 
ing  channels,  absolute  amplitude  modulation  without  diver¬ 
sity  leads  to  a  flattening  of  the  capacity  curve  at  high  SNR. 
This  drawback  is  overcome  by  the  proposed  mapping,  which 
performs  best  over  the  whole  region  of  SNR.  Hence,  the  novel 
scheme  incorporates  the  advantages  of  the  competitors. 


10log,0(E,/N0)  [d8]  — 

Fig.  2:  Capacities  for  AWGN  and  Rayleigh  fading  channel.  N  =  3. 
Ring  ratio  ri/ro  =  2. 

Noteworthy,  the  attainable  gain  is  for  free,  since  it  does 
not  require  any  increment  in  the  coding/decoding  complex¬ 
ity  when  used  together  with  channel  coding.  The  theoretical 
statements  have  been  verified  by  simulations,  which  show  a 
great  accordance.  For  details  see  [2]. 
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Abstract  -  We  discuss  the  application  of  coded  modulation  for 
power-line  communications.  We  combine  M-FSK  with 
permutation  codes  to  include  frequency  and  time  diversity. 
This  makes  the  transmissions  robust  against  permanent 
frequency  disturbances  and  impulse  noise.  The  scheme  is 
applicable  to  any  frequency  range. 
keywords:  modulation;  power-line  communications;  coding. 


not  decrease  the  distance  to  the  correct  word.  Background  noise:  A 
deletion  error  only  reduces  the  distance  to  the  correct  code  word 
with  one.  An  insertion  error  only  reduces  the  distance  to  an  incorrect 
code  word  with  one.  The  minimum  distance  decoder  is  capable  of 
correcting  the  combination  of  cUJn-l  of  these  types  of  errors. 

m.  CODE  PROPERTIES 


L  INTRODUCTION 

Power  Line  Communication  (PLC)  can  be  seen  as  one  of  the 
possible  solutions  to  the  “last  dirty  mile”  problem  for 
communication  providers.  However,  there  are  several  obstacles. 
According  to  the  European  standards  (CENELEC),  the 
transmitters  arc  output  voltage  limited  and  bandwidth  limited.  In 
addition,  there  are  different  types  of  noise  involved  in  PLC. 
Narrow  band  noise ,  generated  by  television  sets  or  computer 
terminals.  This  type  of  noise  is  permanent  over  a  long  period  of 
time.  Impulse  noise  has  been  reported  in  [1],  From  this  it  can  be 
concluded  that  impulses  are  .1  —  1.  second  apart  and  have  a 
duration  of  typically  less  than  100  psec.  More  details  regarding 
the  channel  properties  can  be  found  in  [2]. 

The  key  idea  in  this  contribution  is  the  combination  of  the 
following:  1)  We  use  M-FSK  for  a  constant  envelope  modulator 
output;  2)  We  use  a  modified  non-coherent  demodulation  with 
multi  valued  outputs;  3)  We  use  a  permutation  code  of  length  M 
where  every  code  word  has  M  different  symbols.  4)  The  decoding 
is  minimum  distance  decoding. 

DL  COMBINED  MODULATION  and  CODING 


An  interesting  mathematical  problem  is  the  design  of  codes.  The 
next  theorem  gives  an  upper  bound  on  the  number  of  code  words  in 
a  permutation  code. 

Theorem  1.  For  a  permutation  code  of  length  M  with  M  different 
code  symbols  in  every  code  word  and  minimum  Hamming  distance 
dmin,  the  cardinality 


|C|< 


M! 

(dmin -D1 


(1) 


It  can  be  shown  that  for  M  <  6,  codes  exist  that  meet  the 
upperbound  with  equality  for  any  dmln  <  M.  The  smallest  value  of  M 
for  which  the  upperbound  (1)  cannot  be  met  with  equality  is  M  =  6 
and  dmin  =  5.  It  has  been  shown  that  for  these  parameters  |C|  =  18, 

[3],  Blake,  [4],  uses  the  concept  of  sharply  k-transitive  groups  to 
define  permutation  codes  with  minimum  distance  M-k+1 . 

The  following  Theorem  gives  the  parameters  of  an  example  of  a 
class  of  permutation  codes  based  on  a  multi-level  code  construction 
with  Reed  Muller  component  codes. 

Theorem  2.  There  exists  an  (M,  jC|,  cf,™  (-permutation  code  with  the 
following  parameters: 


Encoding:  The  information  is  encoded  with  a  permutation  code. 
A  permutation  code  C  consists  of  |C|  words  of  length  M,  where 
every  code  word  has  M  different  symbols.  The  code  has  minimum 
Hamming  distance  dn^,. 

Modulator:  The  symbols  of  a  code  word  are  transmitted  in  time  as 
the  corresponding  frequencies  of  an  M-ary  FSK  orthogonal  signal 
set.  Note  that  we  obtain  a  constant  signal  envelope  and  frequency- 
/  time  diversity  simultaneously. 

Modified  demodulator:  We  use  a  simple  modified  non-coherent 
demodulator  with  M  envelope  detectors.  Every  envelope  detector 
has  a  threshold  T„  1  <  i  <  M.  The  value  of  T,  can  be  optimized 
with  respect  to  symbol  detection  error  rate  and  depends  on  the 
received  signal  energy  and  noise  power  spectral  density  per  sub¬ 
channel.  The  demodulator  outputs  in  parallel  all  envelopes  that 
are  larger  than  their  respective  threshold  Tj.  Tlius,  the  inputs  to 
the  decoder  for  the  permutation  code  are  then  multi-valued 
Decoder:  We  use  minimum  distance  decoding,  i.e.  we  output  the 
message  corresponding  to  the  code  word  that  has  the  minimum 
number  of  differences  with  the  M  subsequent  detector  outputs. 
The  following  errors  may  occur  in  the  detector  output:  1) 
insertions  or  deletions  due  to  background  noise;  2)  single 
insertions  due  to  permanent  narrow-band  noise;  3)  multiple 
parallel  insertions  due  to  broad-band  (impulse)  noise. 
Performance:  A  permanent  frequency  disturbance  only  affects  one 
symbol  in  a  code  word  of  the  permutation  code.  Impulse  noise 
may  signal  the  presence  of  all  frequencies  in  the  demodulator 
output.  If  restricted  to  one  symbol  time  interval,  this  type  of  error 
reduces  the  distance  to  an  incorrect  code  word  with  one.  It  does 


M  =  2m,  (2a) 

|C|  =  (  2m+1  -  2  )  x  (  2™  -  2  )  x  —  x  (  22  -  2  ),  (2b) 

cU  =2m-1,  '  (2c) 

where  m  is  an  arbitrary  positive  integer. 

IV.  SIMULATION  RESULTS  and  CONCLUSIONS 

We  show  how  a  PLC  system  with  reasonable  transmission  speed 
can  use  this  modulation/coding  scheme.  It  appears  that  for  such  a 
system,  background  noise  is  of  no  importance  up  to  a  distance  of 
750  meters.  Extensive  simulation  reports  are  available  from  [5], 
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Abstract  —  The  performance  of  multilevel  coded 
modulation  with  multistage  and  iterative  decodings 
over  phase  noisy  fading  channels  is  evaluated.  Semi- 
analytical  upper  bounds  on  the  bit  error  probability 
are  derived  and  verified  to  be  tight. 

I.  Summary 

Ever  since  coding  for  bandwidth  limited  channels  with  ex¬ 
panded  signal  sets  was  introduced,  the  subtleties  of  trellis 
codes  against,  phase  noise  especially  over  fading  channels  have 
been  of  interest  in  the  literature.  However,  similar  analysis 
for  multilevel  coded  modulation  [1]  can  be  scarcely  found  al¬ 
though  the  situation  is  rather  different,  due  to  the  multiply 
represented  signal  points  in  multistage  decoding.  The  advan¬ 
tages  of  this  coding  scheme  are:  (1)  optimality  in  informa¬ 
tion  theoretic  sense  is  guaranteed  over  Gaussian  channels  with 
staged  decoding:  (2)  flexibility  to  coordinate  the  parameters: 
and  (3)  applicability  to  unequal  error  protection  (UEP)  cod¬ 
ing  as  shown  and  analyzed  in  [2.  3],  In  this  paper,  we  extend 
the  results  in  [4],  and  evaluate  the  error  performance  of  multi¬ 
level  codes  with  coherent  detection  over  phase  noisy  flat  fading 
channels,  based  upon  union  bound  arguments  for  the  condi¬ 
tional  probability  of  a  bit  error  and  Monte  Carlo  integration. 

Multilevel  codes  with  multistage  decoding  can  be  con¬ 
structed  based  on  unconventional  partitioning,  i.e..  other  than 
Ungerboeck’s  set  partitioning,  effectively  for  both  UEP  and 
equal  error  protection.  Hence,  various  combinations  of  sig¬ 
nal  partitioning,  component  codes  and  (asymmetric)  constel¬ 
lations  can  be  considered,  each  usually  showing  a  different  bit 
error  rate  characteristic  [2,  3].  One  of  the  goals  of  this  paper 
is  to  discuss  the  sensitivity  of  each  coded  level  t.o  phase  noise 
in  the  receiver  for  a  number  of  code  constructions,  assuming 
maximum  likelihood  decoding  in  each  staged  decoder. 

With  multistage  decoding,  at.  a  given  level  of  a  multilevel 
coded  modulation  system,  let  Pc(w)  denote  the  pairwise  error 
probability  (PEP)  that  the  decoder  chooses  a  wrong  codeword 
different,  in  w  positions  from  the  transmitted  codeword.  Con¬ 
ditioned  on  a  vector  of  fading  amplitudes  p  =  (pi .  •  •  • .  p„  )  and 
phase  jitter  components  6  =  (0i,  82- •  ■  ■■  9„  ).  the  PEP  becomes 
the  same  as  that  of  an  AWGN  channel.  In  deriving  this  condi¬ 
tional  PEP,  denoted  Pe(w\p,  §).  careful  treatment,  is  necessary 
since  in  general  different,  pairs  of  code  sequences  considered 
in  the  union  bound  share  the  same  decision  regions.  The  line 
joining  the  code  sequences  of  each  pair  considered  in  the  multi¬ 
dimensional  Euclidean  space  is  no  longer  always  orthogonal 
to  the  decision  region  considered  by  the  decoder,  as  shown  in 

■'This  work  was  supported  in  part  by  Association  of  Radio  In¬ 
dustries  and  Businesses  under  the  Public  Participation  Program  for 
Frequency  Resources  Development. 


[3].  The  PEP  can  then  be  obtained  by  integrating  Pf(w\p.6) 
over  the  probability  density  functions  of  the  random  vectors 
p  and  0.  In  general,  the  expression  for  Pr(u>)  is  difficult,  if 
not  impossible,  to  evaluate  in  a  closed  form.  Following  the 
approach  of  [5].  the  conditional  PEP  is  first  expressed  with 
an  alternate  form  of  the  Gaussian  Q-function.  The  resulting 
expression,  although  still  needs  to  be  evaluated  numerically, 
contains  a  single  integral  over  a  finite  range  and  an  integrand 
that,  can  be  evaluated  using  a  Gauss-Hermite  quadrature  in¬ 
tegration  formula.  Although  semi-analytical  in  nature,  the 
results  obtained  constitute  useful  tools  in  the  design  of  multi¬ 
level  codes  for  phase  noisy  fading  channels,  particularly  when 
the  Hamming  weight  of  the  error  events  is  relatively  small. 
This  includes  short,  block  component,  codes  as  well  as  the  er¬ 
ror  floor  region  of  turbo  component,  codes.  Moreover,  the  same- 
set  of  bounds  can  be  used  to  evaluate  the  effect,  of  co-channel 
interference  on  the  error  performance  of  multilevel  codes. 

On  the  other  hand,  the  sensitivity  in  the  waterfall  region 
with  respect,  to  phase  noise,  when  turbo  decoding  is  performed 
in  each  stage,  can  be  reduced  in  essence  to  that  of  decoding 
over  mismatched  channels.  A  similar  argument  holds  for  it¬ 
erative  decoding  of  multilevel  codes,  as  discussed  in  [6],  in 
which  the  design  rules  of  code  construction  arc  different  from 
that  for  multistage  decoding.  In  both  cases,  certain  perfor¬ 
mance  degradation  due  to  phase  noise  has  been  observed  by 
simulation. 
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Abstract  —  It  is  well  known  that  Ungerboeck’s  and 
Imai/Hirokawa’s  multilevel  coded  modulation  systems  give 
essential  gain  in  comparison  to  a  conventional  coded  mod¬ 
ulation  system,  but  as  we  know  a  rigorous  analysis  of  this 
gain  has  not  yet  been  done.  In  this  work  we  present  the 
results  of  an  asymptotical  analysis  and  a  comparison  of  two 
coded  modulation  systems,  conventional  modulation  and 
multilevel  modulation,  using  q-PSK  signaling  and  trans¬ 
mission  over  the  AWGN  channel. 

I.  Introduction  and  Problem  Formulation 

We  study  asymptotical  performances  of  trellis  coded  transmis¬ 
sion  over  AWGN  channel  with  q-PSK  signaling,  when  q  =  2L,  L  is 
integer,  and  the  memory  of  the  code  goes  to  infinity.  We  consider 
two  trellis  coded  modulation  systems,  conventional  trellis  coded 
modulation  and  multilevel  modulation. 

In  the  case  of  conventional  modulation  the  binary  information 
sequence  u  enters  a  memory  m,  rate  R  =  b/c  (bits/q-ary  sym¬ 
bol)  convolutional  encoder,  whose  output  is  over  the  integer  ring 
Zq  modulo  q.  The  encoder  output  symbol  v,  maped  q-PSK  signal 
waveform  Si(t).  The  sequence  of  signal  waveforms  sx(t)  is  transmit¬ 
ted  over  the  AWGfJ  channel.  The  receiver  is  maximum  likelihood 
(Viterbi)  receiver. 

In  the  case  of  multilevel  modulation  the  binary  informa¬ 
tion  sequence  u  is  first  partitioned  into  L  binary  subsequences 
u(1)1u(2),...,u(1').  The  subsequences  uW  are  encoded  by  L  in¬ 
dependent  binary  component  convolutional  codes  of  rates  = 
6(0/c(0  (bits/code  symbol)  and  memories  The  set  of  L  bits 

(one  output  bit  from  each  encoder)  is  synchronously  mapped  onto 
the  signal  point  waveform.  The  sequence  of  signal  waveforms  s  is 
transmitted  over  the  channel.  The  transmission  rate  is  equal  to 
R  =  RO  (bits/channel  use).  The  multistage  decoder  con¬ 

sists  of  a  set  of  L  Viterbi-type  decoders  matched  to  the  codes,  used 
at  the  corresponding  levels  of  encoding. 

Let  k.  and  k  be  the  decoders  complexity  (number  of  encoder 
states)  for  conventional  and  multilevel  system  respectively,  Pe  and 
Pe  be  the  decoding  error  probability  for  two  systems.  We  proved  [1] 
that  for  all  R  <  C,  where  C  is  the  capacity  of  the  AWGN  channel 
with  q-PSK  signaling,  there  exist,  for  both  modulation  systems, 
positive  limits 


Figure  1:  Comparison  of  upper  7 (R),  7 (R)  and  lower  7(77),  j(R) 
bounds  of  the  overall  state-complexity  error  exponents  for  conven¬ 
tional  and  multilevel  modulation  systems;  q  —  32,  Es/No  =  10 
dB. 


Figure  2:  The  optimistic  (upper),  pessimistic  (lower)  and  realis¬ 
tic  estimation  of  the  coding  gain  of  multilevel  modulation  system 
compared  to  the  conventional  modulation  system. 


Ea/No  [dB] 

gain|jj=0  [dB] 

mm 

wssmrnrnm 

0 

0.85 

0.5 

1 

0.95 

0.51 

3 

1.16 

HEtHS 

0.5 

5 

1.37 

BS39 

0.47 

8 

1.66 

WSSM 

0.41 

10 

1.83 

2.51 

0.35 

15 

1.7 

Table  1:  The  asymptotical  gain  of  multilevel  32-PSK  signaling  in 
comparison  to  conventional  32-PSK  signaling  for  R  —  0  and  R  =  Ro 
as  a  function  of  the  signal-to-noise  ratio  E.,  /Nq  for  the  multilevel 
system. 
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logPe 
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_  def 
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log  P e 
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>  0. 
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We  call  7  and  7  the  (asymptotical)  state-complexity  error  exponent 
for  the  modulation  systems  considered.  Let  7  and  7  denote  lower 
bounds  for  7  and  7  respectively,  7  and  7  denote  upper  bounds. 


II.  Comparison  of  Two  Modulation  Systems  and 
Numerical  Results 

In  Figure  1  the  curves  7(77),  7 (R),  7 (R)  and  7 (R)  are  given. 
In  Figure  2  three  bounds  for  coding  gain  of  multilevel  system  in 
comparison  with  conventional  system  are  presented. 
c  The  realistic  bound  corresponds  to  the  coding  gain  of  the  mul¬ 
tilevel  system  over  the  conventional  system,  given  that  the  upper 
(existens)  bounds  for  the  decoding  error  probabilities  are  the  same. 

In  Table  1  the  gains  are  presented  for  different  signal-to-noise 
ratios  Es/No  at  rate  R  =  0  and  at  the  computational  cutoff  rate 
R  =  Rq  =  Rq. 
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Abstract  —  The  natural  analogues  of  Lee  weight  and 
Gray  map  over  F-i  are  introduced.  Self-dual  codes  for 
the  euclidean  scalar  product  with  Lee  weights  mul¬ 
tiple  of  4  are  called  Type  II.  They  produce  Type  II 
binary  codes  by  Gray  map.  All  extended  Q  —  codes 
[3]  of  length  multiple  of  4  are  Type  II,  this  includes 
Generalized  Quadratic  Residue  Codes  attached  to  a 
prime  power  q  =  7  (mod  8).  Certain  double  circulant 
codes  are  also  considered.  The  first  binary  extremal 
singly-even  [92,46,16]  self-dual  code  is  constructed.  A 
general  mass  formula  is  derived. 

I.  Definitions  and  first  properties 

Let  F4  :=  {0,  l,u),Q  —  u>2}  be  the  finite  field  of  order  4, 
A  code  C  of  length  n  over  F4  is  an  F4-subspace  of  FJ.  Du¬ 
ality  for  codes  is  understood  with  respect  to  the  Euclidean 
form  ^2iXi Di-  C  *s  sa^  t0  be  self-dual  if  C  =  C"1.  The  Lee 
composition  of  a  vector  x  =  (x\, . . .  ,xn)  G  FJ  is  defined 
as  ( no{x),n\(x),Ti2{x ))  where  no(x)  is  the  number  of  Xi  —  0, 
ri2(x )  the  number  of  Xi  —  1  and  n i(x)  —  n  -  n o(x)  —  712(1) 
where  n  is  the  length.  The  Lee  weight  wl(x)  of  x  is 
then  defined  as  n i(x)  +  2712(2:).  There  is  a  natural  ( not  F4- 
linear!)  Gray  map  <f>  which  is  a  F2-linear  isometry  from 
(FJ,  Lee  distance)  onto  (F|n,  Hamming  distance)  where  the 
Lee  distance  of  two  codewords  x  and  y  is  the  Lee  weight  of 
x  —  y.  We  let,  for  all  £  FJ 

cj>(ujx  +  a>y)  =  ( x,y ). 

This  leads  us  to  introduce  an  Euclidean  weight  we{-)  on 
F4  by  the  rule  we(0 )  =  0 ,we(oj)  =  we(  1)  =  1,we{w)  —  2. 
Observe  that  x  t-t  ujx  is  an  isometry  from  (F4,  we)  to  (F4,  wl). 

Since  multiplying  a  column  by  w  does  not  preserve  the  Eu¬ 
clidean  or  Lee  weight  of  a  codeword,  we  need  a  restricted 
definition  of  equivalence  and  we  say  that  two  codes  are  equiv¬ 
alent  if  one  can  be  obtained  from  the  other  by  permuting  the 
coordinates  (this  is  not  the  usual  monomial  equivalence). 

A  self-dual  code  over  F4  is  said  to  be  Type  II  if  the  Lee 
weight  of  every  codeword  is  a  multiple  of  4  and  Type  I  oth¬ 
erwise.  The  following  lemma  follows: 

Proposition  1.1  If  C  is  self- orthogonal  so  is  <f>(C).  In  this 
case  (f>(C)  is  a  Type  I  (resp.  Type  II)  code  iff  C  is  a  Type  I 
(resp.  ■ Type  II)  code. 


II.  Constructions 

(a)  Quadratic  residue  codes 

Let  q  be  a  power  of  a  prime  with  q  congruent  to  3  (mod  8) . 
Let  C(q )  denote  the  extended  generalized  quadratic  residue 
code  of  length  q  +  1  over  F4  [2], 

Proposition  II.  1  The  code  C(q)  is  a  Type  II  code  over  Fa. 


(b)  Quadratic  double  circulant  codes 

Recently  a  class  of  codes  which  generalizes  binary  double 
circulant  codes  and  the  Pless  symmetry  codes  to  codes  over 
F4  was  introduced  in  [1].  These  codes  are  also  Type  II.  The 
following  table  gives  the  parameters  of  these  codes  along  with 
those  of  their  binary  images: 


n 

k 

d 

— y  , 

kb 

dh 

Type 

14 

7 

6 

28 

14 

6 

I 

16 

8 

6 

32 

16 

8 

II 

46 

23 

14 

92 

46 

16 

I 

48 

24 

14 

96 

48 

16 

II 

62 

31 

16 

124 

62 

16 

I 

64 

32 

16 

128 

64 

16 

II 

Table  1:  Quadratic  double  circulant  codes  over  F4  and 
their  Type  II  binary  images 

(c)  Q-codes 

The  case  of  quadratic  residue  codes  is  a  special  case  of 
Q-codes  of  prime  length.  An  extended  Q-code  of  composite 
length  is  Euclidean  self-dual  if  and  only  if  its  length  is  a  mul¬ 
tiple  of  primes  which  are  congruent  to  3  modulo  4  ([3]).  We 
already  saw;  that  Euclidean  self-dual  quadratic  residue  codes 
were  Type  II.  We  nowr  generalize  this  result  to  Q-codes: 

Theorem  II. 2  Let  C  be  an  odd-like  duadic  code  (or  Q-code) 
over  F4  of  length  n  =  3  modulo  4,  with  splitting  given  by  fi-i. 
Let  C  be  the  extended  code  of  C .  Then  the  Gray  image  of  C 
is  of  Type  II. 


III.  Classification 

To  elaborate  a  mass  formula,  useful  for  a  complete  classifica¬ 
tion,  we  need  to  know  the  number  of  distinct  Type  II  codes, 
which  is  given  by: 


Theorem  III.l  Let  n  be  an  integer  multiple  of  4  and  let 
Nd,,{n)  be  the  number  of  distinct  Type  II  codes  over  Fa  then: 


Ndu(n)=  J] 

1=1 


+3.2"' 


1 


4*  -  1 
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Abstract  —  Recently  Type  II  codes  over  F4  have 
been  introduced  as  Euclidean  self-dual  codes  with  the 
property  that  all  Lee  weights  are  divisible  by  four.  We 
construct  new  extremal  Type  I  and  Type  II  codes,  and 
show  that  there  are  seven  Type  II  codes  of  length  12, 
up  to  permutation-equivalence. 

I.  Introduction 

Recently  Gaborit,  Pless,  Sole  and  Atkin  [1]  introduced 
Type  II  codes  over  F4  =  {0,  l,u,u>  =  w2}.  These  codes  are 
closely  related  to  binary  Type  II  codes  via  the  Gray  map  de¬ 
fined  in  [2], 

A  linear  code  C  of  length  n  and  dimension  k  over  F4  is  a 
fc-dimensional  vector  subspace  of  F4.  A  code  C  is  said  to  be 
Euclidean  self-dual  (resp.  self-orthogonal)  if  C  =  C ±  (resp. 
C  C  C1)  where  CL  is  the  dual  code  of  C  under  the  Euclidean 
inner  product. 

Let  no(x),nu(x),no(x)  and  m(x)  be  the  numbers  of  0’s, 
w’s,  w’s  and  l’s  in  a  vector  x  €  F4  ,  respectively.  The  Lee 
weight  wtL(x)  of  x  is  defined  as  2ni(x)+nu,(x)-|-nu(x).  Type  II 
codes  are  self-dual  codes  under  the  Euclidean  inner-product 
with  the  property  that  all  Lee  weights  axe  divisible  by  four. 
Euclidean  self-dual  codes  which  axe  not  Type  II  are  called 
Type  I.  Type  II  codes  are  divided  into  two  classes,  namely, 
odd  Type  II  codes  and  even  Type  II  codes. 

The  Hamming  weight  of  x  is  the  number  of  non-zero  com¬ 
ponents  of  x.  The  minimum  Lee  weight  di  (resp.  Hamming 
weight  dn)  of  C  is  the  smallest  Lee  (resp.  Hamming)  weight 
among  all  non-zero  codewords  of  C. 

We  have  found  several  properties  of  even  Type  II  codes  as 
well  as  odd  Type  II  codes  from  properties  of  binary  Type  II 
codes.  For  example,  there  is  a  Type  II  code  of  length  n  if  and 
only  if  n  is  divisible  by  four.  The  minimum  Lee  weight  di  of  a 
Type  II  code  of  length  n  is  upper  bounded  by  d l  <4  [y|]  +4. 
A  Type  II  code  of  length  n  with  =  4  [^]  +  4  is  extremal. 
We  have  found  that  an  even  Type  II  code  is  not  extremal  for 
lengths  n  >  16. 

II.  Classification  of  Lengths  up  to  12 

There  is  a  unique  Type  II  code  C4  of  length  4  [1].  Let  Cg 
be  the  code  with  generator  matrix  (  I4  ,  J\  —  1 4  ),  where  I 4 
and  J4  axe  the  identity  matrix  and  the  all-ones  matrix  of  order 
4,  respectively.  Cs  is  the  only  extremal  even  Type  II  code,  up 
to  permutation-equivalence.  Cg  and  C4  are  the  only  Type  II 
codes  of  length  8,  up  to  permutation-equivalence  [1]. 

The  classification  of  Type  II  codes  of  length  12  is  given  in 
Table  1.  The  mass  formula  in  [1]  shows  that  our  classification 
is  complete. 

Theorem  1  There  are  seven  Type  II  codes  of  length  12,  up 
to  permutation- equivalence. 


Table  1:  The  Type  II  codes  of  length  12 


Codes 

d,L 

\PAut(Ci2,i)\ 

<f>(Cl2,i) 

<^12,1 

4 

10368 

eg 

Cl2,2 

4 

16128 

eg 

Cl2,3 

4 

972 

D24 

Cl2,4 

4 

432 

D24 

Cl2,5 

4 

23040 

A24 

Cl2,6 

4 

1152 

F: m 

Ci2,7 

8. 

660 

G24 

The  generator  matrices  (  I  ,  A,  )  of  Cu,i  using  the  form 
01 , 02, . . . ,  as  where  a,j  is  the  j-th  row  of  A, . 

Ai  :  OOOOukj,  OOOOww,  OOwniOO,  OODcjOO,  wwOOOO,  tDwOOOO, 

A2  :  OOOOww,  OOOOww,  011100, 101100, 110100, 111000, 

A3  :  OOOOwiqOOoAIiOO,  OujuHjjIu),  OujujICou)  ,  uiOluiujut ,  cDOcDcocol, 
A4  :  OOOOlliu),  OujuluIcj,  OlucjcjIlu,  QuiuiloIuj  ,uj1111ui  ,u)ujujujujui  , 
A5  :  000111,001011,010011, 100011,  llllwu),  llllww, 

A6  :  000111,  0cjlouju)1,  OuiljojuI,  Iu/uiuiOu),  IG/uiOloui,  llluwl, 

A7  :  0wuQQl,w0QulQ,uuQfal,Qu0Qlu,Qlu)ni,lQl(jll. 

III.  New  Extremal  Type  II  Codes 

A  pure  double  circulant  code  of  length  2n  has  a  generator 
matrix  of  the  form  (  I  ,  R)  where  /  is  the  identity  matrix  of 
order  n  and  R  is  an  n  x  n  circulant  matrix.  Extremal  double 
circulant  Type  II  codes  are  given  in  Table  2 


Table  2:  Type  II  pure  double  circulant  codes 


Codes 

The  first  row  r 

dr 

D/i,  20 

lliDwllOOOO 

8  (extremal) 

Du, 24 

110111101000 

8 

Du,  28 

uitouJujOuiLjOl  10000 

12  (extremal) 

Du,  32 

wwOwOlwlwOwOOOOO 

12  (extremal) 

Du,  36 

oiOldiaiailwwlOOlOOOOO 

12 

Extremal  Type  II  codes  of  lengths  16, 20  and  28  were  con¬ 
structed  in  [1],  Du, 32  is  the  first  example  of  an  extremal 
Type  II  code  of  length  32. 
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Abstract  - —  The  problem  of  finding  the  values  of 
Aq(n,  d)— the  maximum  size  of  a  code  of  length  n  and 
minimum  distance  d  over  an  alphabet  of  q  elements — 
is  considered.  When  q  <  M  <  2q,  all  parameters  for 
which  Aq(n,d)  =  M  are  determined.  Methods  for  ob¬ 
taining  upper  and  lower  bounds  on  ARn,  d)  are  dis¬ 
cussed. 


I.  Introduction 

Let  Zq  denote  the  set  {0, 1, . . .  ,  <j-l}  and  let  Z"  be  the 
set  of  all  n-tuples  (vectors)  over  Zq.  An  (n,  M,  d)q  code  is 
a  code  over  Zq  that  has  length  n,  size  M,  and  minimum 
distance  d.  One  of  the  main  problems  in  combinatorial 
coding  theory  is  to  find  the  largest  possible  value  of  M 
when  the  other  parameters  have  been  fixed;  this  value  is 
denoted  by  Aq(n,d)  and  corresponding  codes  are  called 
optimal. 

Linear  quaternary  codes  have  earlier  been  considered, 
for  example,  in  [3].  Except  for  some  preliminary  results 
of  this  work,  which  were  presented  in  [2],  only  sporadic 
results  have  been  published  earlier  in  the  general  quater¬ 
nary  case. 


II.  On  Small  Optimal  Codes 

To  obtain  our  main  result,  we  combine  the  Plotkin 
bound,  the  juxtaposition  construction,  a  result  by 
Baranyai  [1],  and  the  following  theorem,  which  general¬ 
izes  a  result  from  [4]. 


III.  Finding  Bounds  on  Aq(n,d) 

Upper  bounds  on  Aq(n,d)  can  be  obtained,  for  exam¬ 
ple,  from  the  Plotkin  bound,  the  Hamming  bound,  and 
the  linear  programming.bound. 

Lower  bounds  on  Aq(n,d)  are  obtained  by  construct¬ 
ing  corresponding  codes.  An  exhaustive  computer  search 
is  for  all  but  the  smallest  parameters  out  of  question.  To 
search  for  codes,  we  therefore  have  to  use  stochastic  meth¬ 
ods  and/or  prescribe  a  structure  of  the  codes  to  restrict 
the  search. 

As  for  the  structure  of  the  codes,  four  different,  meth¬ 
ods  have  been  used  for  q  =  4.  These  give  additive  codes 
over  Z2  x  Z2,  lexicographically  minimal  codes,  codes  that 
consist  of  orbits  of  words  under  the  action  of  a  permu¬ 
tation  group,  and  codes  that  consist  of  cosets  of  a  linear 
code,  respectively. 

To  give  an  example,  the  13  vectors  below  generate  a 
(12,  213, 5)  additive  code  over  Z2  x  Z2,  a  current  record. 
(The  four  symbols  00,01,10,11  of  Z2  x  Z2  are  written 
0,1, 2,3.) 


111110000000 

110001110000 

101001001100 

010100101010 

211001000001 

022001101000 

021200001001 


130102100001 

110020201000 

011121021001 

101100102200 

000121002020 

131121002002 


Theorem  1  Suppose  we  have  a  resolvable  PBD(v  = 
M,K;X)  with  n  parallel  classes,  where  each  parallel  class 
has  at  most  q  blocks.  Then  there  exists  an  equidistant 
( n ,  M,  n  —  \)q  code. 

The  main  theorem  is  as  follows. 

Theorem  2  Let  q  <  M  <  2q.  Then  an  (n,M,  n  -  X)q 
code  exists  if  and  only  ifn/X  <  M(M  —  1)/2(M  -  q).  For 
M  2q  —  1  equality  implies  that  such  a  code  is  optimal. 

Corollary  1  For  q  <  M  <  2q,  Aq(n,d)  =  M  exactly 
when 

(M  +  l)2  -  3 (M  +  1)  +  2q  _  M2  -?,M  +  2q 

(M  +  l)2  —  {M  +  1)  n<  -  M2-M 


^his  work  was  partially  supported  by  the  Bulgarian  National 
Science  Fund  and  by  the  Academy  of  Finland. 


We  have  collected  the  best  known  bounds  on  A4(n,d) 

for  n  <  12. 
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Abstract  —  We  consider  the  problem  of  find¬ 
ing  the  maximal  size  A3(d,wo,wi,W2)  of  a  ternary 
constant-composition  code.  We  describe  a  construct¬ 
ion  of  ternary  constant-composition  codes  that  proves 

>13(4,  4m  +  l,2, 1)  =  (m  +  l)(4m  +  2)  and  A3  (4, 4m  —  1,  2, 1)  — 
m(4m  +  2). 


with  2’s  on  the  main  diagonal  and  0’s  elsewhere.  We  construct 
matrices  Si,  B 2,  S3  and  S4  as  follows: 


10 

l2m+l 

2/2m+l 

10 

I.  Introduction 

We  study  the  problem  of  determining  the  maximal  size  of  a 
ternary  block  code  with  constant  composition.  The  metric  we 
are  interested  in  is  the  Hamming  metric.  Let  each  codeword 
have  wo  0’s,  w\  l’s  and  W2  2’s.  Denote  the  minimum  Hamming 
distance  of  a  code  by  d  and  let  A3(d,  wo,  Wi ,  102)  denote  the 
maximal  size  of  a  code.  We  let  n  denote  the  length  of  a  code. 
The  corresponding  functions  ^2(71,  d)  for  binary  codes  without 
restrictions,  A-2(n,d,w)  for  binary  constant-weight  codes  and 
.<43(71,  d)  for  ternary  codes  without  restrictions  have  been  thor¬ 
oughly  investigated.  The  papers  [1]  and  [2]  contain  extensive 
lists  of  references  on  these  problems.  The  problem  of  deter¬ 
mining  A3(d,  wo,  wi ,  W2)  on  the  other  hand  has  received  very 
little  attention.  Two  references  for  results  on  this  problem  are 

[3]  and  [4]. 

We  focus  on  ternary  constant-composition  codes  with  Ham¬ 
ming  weight  three.  Without  loss  of  generality  we  assume 
wi  =  2  and  11)2  =  1.  In  [5]  we  presented  a  construction  of  codes 
with  this  composition  and  minimum  distance  three,  whereas 
we  here  give  a  construction  of  codes  with  minimum  distance 
four. 


II.  Construction 

Let  77i  be  a  positive  integer.  Take  D *  to  be  the  m  x  (2771+1) 
matrix  with 

f  2.  = 

D‘j  =  <  1,  if  j  =  i  +  1  or  j  =  2m  —  i  +  2; 

l,  0,  otherwise. 

Let  D  be  the  m(2m  + 1)  x  (2m  +  1)  matrix  with  rows  equal 
to  all  different  cyclic  shifts  of  the  rows  of  D* ,  in  arbitrary 
order.  Let  D\  and  D2  be  two  m(2m+  1)  x  (2m  +  1)  matrices. 
Take  D\  to  be  1  in  exactly  those  positions  where  D  is  1,  and 
take  it  to  be  0  elsewhere.  Similarly,  take  D2  to  be  2  in  exactly 
those  positions  where  D  is  2,  and  take  it  to  be  0  elsewhere. 
We  note  that  all  rows  of  D\  are  different  and  that  for  any 
selection  of  two  columns  of  D\  there  is  exactly  one  row  that 
has  its  l’s  in  these  two  columns. 

We  use  the  notation  l2m+\  for  the  (2m  +  1)  x  (2m  +  1) 
identity  matrix  and  2/2m+i  for  the  (2m+  1)  x  (2m  +  1)  matrix 

^his  work  was  supported  by  the  Swedish  Research  Council  for 
the  Engineering  Sciences  under  grant  271-97-532. 


B2 


2/2m  +  l 

^2771  +  1 

’  00 

’  00 

b3  = 

00 

D\ 

d2 

,  Bn  = 

00 

d2 

Di 

Let  C4m+!  be  the  code  consisting  of  the  union  of  all  the 
rows  of  B 1,  B2,  B3  and  B4.  We  define  Cim-i  for  i  =  0, 1,2 
to  be  C4m+i  shortened  with  respect  to  0’s  in  the  first  i  +  1 
positions. 


III.  Bounds  on  A3( 4,m0,2, 1) 

Our  main  result  is  the  following  theorem: 


Theorem  1  For  any  integer  m  >  1,  the  equalities 

A3(4,4m  +  1,  2, 1)  =  (m  +  l)(4m  +  2), 

A3(4, 4m  —  1,  2, 1)  —  m(4m  +  2), 

hold. 


We  are  currently  not  aware  of  any  codes  having  a  larger 
number  of  codewords  than  Cim  as  constructed  above.  How¬ 
ever,  larger  codes  than  C4m-2  are  known  for  small  m,  see  also 
[3]  and  [4]. 
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Abstract  —  In  the  presentation  we  find  an  analytic 
expression  for  the  maximum  of  the  normalized  entropy 
—  Z/ierP»  *nP*/5ZieT*P>  where  the  setT  is  the  disjoint  union 
of  sets  Sn  of  positive  integers  that  are  assigned  probabilities 
Pn,  ^2nPn  =  1.  This  result  is  applied  to  the  computation 
of  the  capacity  of  weakly  (d,k)- cons  trained  sequences  that  ore 
allowed  to  violate  the  (d,  k)-constraint  with  small  probability. 

I.  Problem  description  and  results 


perfectly  constrained  sequences.  A  (d,  /c)-constrained  se¬ 
quence  can  be  thought  to  be  composed  of  ’phrases’  10*, 
d  <i  <  k,  where  0‘  means  a  series  of  i  ’zeros’.  In  order  to 
compute  the  channel  capacity,  i.e.  the  maximum  zo/ln2 
of  the  entropy  Hj  In  2,  we  define 

T  =  {l,...,d}u{d  + 1,...,*+1} 

U  {k  +  2,  k  -I-  3, . . .}  =:  Si  U  S2  U  S3,  {) 


Let  T  be  a  set  of  positive  integers,  and  assume  that  T  is 
the  disjoint  union  of  a  (finite  or  infinite)  number  of  non¬ 
empty  sets  S„,n  £  M.  Also  assume  that  there  are  given 
numbers  Pn  >0,  n  £  M,  with  Y2n  Pn  =  1.  We  show  the 
following  result. 


Theorem:  The  maximum  of 

~  SfgrPdnP« 


(1) 


(In  :  natural  logarithm)  under  the  constraints  thatpi  >  0, 
Y2ies  pi  =  Pn,  n  £  M ,  equals  zo,  where  zo  >  0  is  the 
unique  solution  z  of  the  equation 


where  d  =  0, 1, . . .,  and  k  =  d+  1,  d+  2, . . .  are  given,  and 
we  compute  the  capacity  for  the  case  that  the  probabil¬ 
ities  Pi,  P3  assigned  to  the  sets  Si,  S3  are  both  small. 
Clearly,  the  quantities  Pi  and  P3  denote  the  probabilities 
that  phrases  are  transmitted  that  are  either  too  short  or 
too  long,  respectively.  We  find  that  the  familiar  capacities 
of  (d,  &)-constrained  sequences  [2]  are  approached  from 
above  as  Pi ,  P3  ->  0  with  an  error  A(Pi  In  Pi  +  P3  In  P3), 
where  we  can  evaluate  the  A  explicitly.  We  obtain  a  sim¬ 
ilar  result  for  the  case  that  T  is  as  in  (6)  with  Si,  S3 
merged  into  a  single  set  Si  U  S3.  Further  results  are  pub¬ 
lished  in  [3]. 

Conclusions 


-  £  Pn  In  Qn(z)  =  -  £  P„  In  Pn  (2) 

n£M  n&M 

with  for  z  >  0 

Qn(z)  ■-  e~U'  n  6  M-  W 

«es„ 

Moreover,  the  optimal  p,  are  given  by 


We  have  presented  an  analytic  expression  for  the  maxi¬ 
mum  of  the  normalized  entropy  —  Ylier  Pi  ln  Pt/  *P» 
under  the  condition  that  T  is  the  disjoint  union  of  sets 
Sn  of  positive  integers  that  are  assigned  probabilities 
Pn ,  Y2n  Pn  —  I-  We  computed  the  capacity  of  weakly 
(d,  fc)-constrained  sequences  that  are  allowed  to  violate 
the  (d,  /^-constraint  with  given  probability. 


Pi 


Qn(z 0) 

and  for  these  pi  we  have  that 
d 


P ”  e  Uo,  i  £  Sn,n  £  M, 


J2*Pi 

i€T 


dz 


-  ^2  PnlnQn(z) 

n£M 


{zo). 


(4) 


(5) 


As  an  application  of  this  result  we  consider  weakly  con¬ 
strained  (d,k)  sequences  [1].  A  binary  (d, k) -constrained 
sequence  has  by  definition  at  least  d  and  at  most  k  ’zeros’ 
between  consecutive  ’ones’.  Weakly  constrained  codes 
produce  sequences  that  violate  the  specified  constraints 
with  a  small  probability.  It  is  argued  that  if  the  channel 
is  not  free  of  errors,  it  is  pointless  to  feed  the  channel  with 
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Abstract  —  We  study  the  number  of  binary  se¬ 
quences  whose  differences  do  not  include  certain  dis¬ 
allowed  patterns.  We  show  that  the  number  of  such 
sequences  increases  exponentially  with  their  length 
and  that  the  exponent,  or  capacity,  is  the  logarithm 
of  the  joint  spectral  radius  of  an  appropriately  defined 
set  of  matrices.  We  derive  a  new  algorithm  for  deter¬ 
mining  the  joint  spectral  radius  of  sets  of  nonnegative 
matrices  and  combine  it  with  existing  algorithms  to 
determine  the  capacity  of  several  sets  of  disallowed 
differences  that  arise  in  practice. 

I.  Codes  that  avoid  difference  patterns 

The  bit-error-rate  of  a  recording  channel  is  often  dominated 
by  a  small  set  of  error,  or  difference,  patterns.  Binary  codes 
have  been  proposed  which  exploit  this  fact,  e.g.,[l].  The  codes 
are  designed  to  avoid  the  most  problematic  difference  patterns 
by  constraining  the  set  of  allowed  recorded  sequences  and  have 
been  shown  to  improve  system  performance.  In  order  to  maxi¬ 
mize  the  achievable  linear  density  for  a  recording  channel,  it  is 
important  to  identify  constraints  that  permit  the  highest  pos¬ 
sible  code  rate.  To  that  end,  we  study  the  largest  number  of 
sequences  whose  differences  exclude  a  given  set  of  disallowed 
patterns. 

More  specifically,  let  D  be  a  finite  set  of  finite-length  dis¬ 
allowed  difference  patterns,  and  let  Cn  be  a  collection  of  n-bit 
sequences  whose  differences  do  not  contain  any  patterns  in  D. 
The  largest  number  of  sequences  whose  pairwise  differences 
do  not  include  any  pattern  in  D  is 

Sn(D)  =f  max{|Cn|  :  Cn  avoids  D}. 

We  define  the  capacity  of  D  as  the  limit 

cap(D)  =f  log  [  lim  (Jn(D))1/,ll  . 

Ln—foo  J 

We  show  that,  for  every  finite  D, 

cap(D)  =  log  p(E(D)) 

where  E(D)  is  an  appropriately  defined  set  of  adjacency  ma¬ 
trices  and  p  is  the  joint  spectral  radius  of  the  set  [2].  This 
equality  may  be  viewed  as  a  generalization  of  the  well-known 
result  that  the  growth  rate  of  the  number  of  sequences,  or 
Shannon  capacity,  of  a  constrained  system  is  the  logarithm 
of  the  spectral  radius  of  an  appropriately  defined  adjacency 
matrix. 

II.  Computing  the  joint  spectral  radius 

Computing  the  joint  spectral  radius  of  a  set  of  matrices 
is,  in  general,  a  hard  problem  [3].  Algorithms  for  computing 
it  have  been  described  in  [4,  5].  We  derive  a  new  algorithm 


Table  1:  Capacity  of  various  difference  sets  D 


m 

D 

cap(J9) 

O 

m  >  1 

fjlSUSg 

0 

2 

+- 

a 

m 

++ 

a. 

E&mKMI 

0+0 

.5 

■ 

3 

+0+ 

a 

+++ 

6 

in 

+-+ 

S 

a  =  loga((l  +  >/5)/2)  =  .6942... 

5  =  log2((l  +  (19  +  3v^3)1/S  +  (19  -  3\/33)1/3)/3)  =  .8791... 

to  compute  p(E(D))  and  determine  or  closely  approximate 
cap(D)  for  a  number  of  difference  sets  D  of  practical  interest. 

Table  1  summarizes  known  values  of  cap(D)  for  a  number 
of  difference  sets  D  consisting  of  a  single  pattern  of  length  m. 
Next  to  the  capacity,  we  list  a  constraint  describing  a  sequence 
of  codes,  {Cn},  such  that  each  Cn  avoids  D  and 

lim  log|Cn|1/n 

n— foo 

achieves  cap(D).  The  constraint  is  defined  by  a  list  of  for¬ 
bidden  patterns  O.  If  no  superscript  is  listed  with  a  pattern, 
the  pattern  is  forbidden  from  appearing  in  all  positions  of  the 
code.  If  superscripts  appear,  then  the  patterns  are  periodic 
and  the  period  is  one  more  than  the  largest  superscript.  The 
superscript  then  represents  the  positions  (modulo  the  period) 
in  which  the  pattern  is  disallowed  from  starting. 
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Abstract  —  In  this  paper  we  consider  the  analysis  and  design 
of  optimal  block-decodable  M- ary  runlength-limited  (RLL) 
codes.  We  present  two  general  construction  methods:  one  based 
on  permutation  codes  due  to  Datta  and  McLaughlin,  and  the 
other  a  nonbinary  generalization  of  the  binary  enumeration 
methods  of  Patrovics  and  Immink,  and  Gu  and  Fuja.  The 
construction  based  on  permutation  codes  is  simple  and 
asymptotically  (in  blocklength)  optimal,  while  the  other 
construction  is  optimal  in  the  sense  that  the  resulting  codes 
have  the  highest  rate  among  all  block-decodable  codes  for  any 
blocklength.  In  the  process,  we  shall  also  prove  a  new  result  on 
the  capacity  of  (M,d,k)  constraints.  Finally,  we  present  examples 
of  remarkably  low-complexity  (M,d,k)  block  codes  which 
achieve  the  optimal  rate  without  the  use  of  enumeration. 

I.  Introduction 

Traditional  optical  recording  employs  saturation  recording, 
where  the  channel  input  is  constrained  to  be  a  binary  sequence 
satisfying  runlength-limiting  (RLL)  or  (d,k)  constraints.  A  binary 
(£/,£)-constrained  sequence  is  one  in  which  the  number  of  zeroes 
between  consecutive  ones  is  at  least  d  and  at  most  k.  The  idea  of 
optical  recording  with  M  (M>2)  levels  has  been  proposed  [1],  and 
previous  work  in  coding  for  such  nonbinary  channels  includes 

[l]-[3].  Assuming  an  M-ary  symbol  alphabet,  {0,1,  ...,  M  -1},  M<°°, 
an  M-ary  runlength-limited  or  ( M,d,k )  sequence  [3]-[4]  is  one 
where  at  least  d  and  at  most  k  zeroes  occur  between  nonzero 
symbols.  Binary  ( d,k )  codes  are  ( M,d,k )  codes  with  M=  2. 

In  this  paper  we  present  two  broad  code  construction  techniques 
for  block-decodable  ( M,d,k )  codes.  First,  we  give  a  new  result  on 
the  capacity  of  ( M,d,k )  constraints;  this  leads  to  a  simple  code 
construction  which  produces  codes  that  asymptotically  (in 
blocklength)  achieve  capacity.  Second,  we  extend  the  enumerative 
construction  of  Patrovics  and  Immink  [5]  to  the  nonbinary  case. 
We  show  how  this  algorithm  can  be  used  to  design  optimal 
deterministic  block  codes;  these  codes  are  optimal  in  the  sense 
that  they  have  the  highest  possible  rate  among  all  block- 
decodable  codes.  Finally,  we  present  examples  of  M-ary  block 
codes  that  achieve  the  optimal  rate  through  a  novel  use  of  lookup 
tables  rather  than  the  more  complex  enumeration  scheme. 

II.  ON  THE  CAPACITY  OF  (M,d,k)  CODES 

The  allowable  sequences  in  an  (M@,k)-constrained  code  are  made 
up  of  phrases,  where  each  phrase  begins  with  at  least  d  and  at  most 
£  zeroes  and  ends  with  a  single  nonzero  symbol.  For  example,  an 
allowable  (M,d,k)=( 5,1,7)  sequence  is 

0002  00001  0000004  003  CM  00000004 

where  individual  phrases  have  been  underlined  for  emphasis. 

Next,  we  state  a  result  on  capacity.  This  is  the  M-ary 
generalization  of  Theorem  1  ofZehavi  and  Wolf  [6].  Let  X,  be  a 
random  variable  describing  the  number  of  symbols  in  the  z'th 
phrase  of  the  parsed  sequence,  and  let  A\  be  a  random  variable 
denoting  the  nonzero  value  (amplitude)  of  the  terminating  symbol 
of  the  phrase. 

Theorem  1.  The  code  achieving  maximum  information  rate  has 
the  following  properties: 

(1)  The  random  variables  A\pii,...  are  statistically  independent 
and  uniformly  distributed 

(2)  The  random  variables  X\Xi,---  are  statistically  independent 
and  identically  distributed 

(3)  The  probability  distribution  of  X  is 

P (X=i)=(M- 1  )2'ic,  i=d+ 1  ,...,*+ 1 


where  C  is  the  capacity  of  the  ( M,d,k )  constraint.  Any  (M,d,k)  code 
that  achieves  capacity  satisfies  (l)-(3),  and  conversely,  any  code 
satisfying  (l)-(3)  achieves  capacity.  □ 

Using  this  theorem,  we  present  an  asymptotically  efficient,  fixed- 
rate,  parallel  encoder  that  is  scalable  with  respect  to  M  and 
maintains  backward  compatibility  with  a  binary  RLL  system  [9], 

III.  Block  codes  based  on  ( M,d,k,l,r )  sequences 
In  our  talk,  we  present  a  nonbinary  generalization  of  the 
enumeration  algorithm  given  by  Patrovics  and  Immink  [5],  Using 
this  enumeration  algorithm  for  ( M,d,k,l,r )  sequences,  we  are  now 
able  to  extend  two  important  ( d,k )  code  constructions  ([7], [8])  to 
the  nonbinary  case  [9]-[10],  Based  on  this,  we  can  show  that  the 
optimal  rate  of  the  ( M,d,k )  code  with  blocklength  n  is 

Ropt  =I°g2(  EA^-l («-/))/«. 

i=d  / 

IV.  Examples  of  optimal  (M,d,k)  block  codes 
Next,  we  present  examples  of  block-encodable/block-decodable 
codes  which  achieve  the  optimal  code  rate,  but  do  not  require  the 
aforementioned  enumeration  algorithm.  Rather,  these  codes  use  a 
series  of  look-up  tables  consisting  of  ‘templates’  in  order  to 
encode  and  decode  with  very  low  complexity.  As  a  result,  the 
storage  space  required  for  these  codes  is  remarkably  small. 

Specifically,  for  an  (M, l,7)-constrained  code  with  blocklength 
«=8,  it  can  be  shown  that  the  optimal  rate  7?opt—  1 0/8,  13/8  for 
M=5,  9  respectively.  In  our  talk,  we  show  how  one  look-up  table 
consisting  of  37  templates  of  8-bit  codewords  can  be  used  to 
generate  both  of  these  optimal  codes. 

Finally,  we  present  an  optimal,  92.4%  efficient,  (5,2,10)-code 
with  blocklength  n=26  and  7fopt=24/26.  This  code  requires  the  use 
of  6  look-up  tables  containing  a  total  of  only  203  13-bit  templates 
[9]. 
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Abstract  —  Suppose  we  are  given  a  block  code ,  that 
is,  a  list  of  at  least  2P  q-bit  self-concatenable  code¬ 
words.  A  rate  p  :  q  block  encoder  is  a  dataword-to- 
codeword  assignment  from  2P  p-bit  datawords  to  2P  q- 
bit  codewords,  and  the  corresponding  block  decoder 
is  the  inverse  of  the  encoder.  We  propose  efficient 
heuristic  computer  algorithms  (i)  to  eliminate  the  ex¬ 
cess  codewords;  and  (ii)  to  construct  low  hardware 
complexity  block  encoders/decoders.  Constructing 
low-complexity  encoder/decoders  for  very  high  rate 
codes  is  of  immense  economical  value-as  these  codes 
may  be  implemented  in  mass-market  magnetic  record¬ 
ing  systems.  For  several  practical  constraints,  block 
encoders/decoders  generated  using  the  proposed  al¬ 
gorithms  are  comparable  in  complexity  to  human¬ 
generated  encoders/decoders,  but  are  significantly 
simpler  than  lexicographical  encoders/decoders. 

I.  Extended  Abstract 

Constrained  coding  is  used  in  magnetic  recording  systems 
to  encode  unconstrained  user  sequences  into  channel  output 
sequences  that  satisfy  certain  hard  constraints  such  as  various 
limits  on  the  run  lengths  of  zeroes.  A  block  code  is  a  col¬ 
lection  of  codewords  satisfying  a  certain  constrint  such  that 
these  codewords  can  be  freely  concatenated  with  each  other 
without  violating  the  underlying  constraint.  Block  codes  have 
been  widely  used  for  converting  unconstrained  user  sequences 
into  desired  constraint  sequences.  The  basic  idea  in  a  rate 
p  :  q  block  code  is  to  identify  a  codebook  containing  2P  q-bit 
codewords  that  satisfy  the  desired  constraint,  and  to  design 
an  encoder  that  assigns  each  2P  p-bit  dataword  in  a  one-to- 
one  and  onto  fashion  to  a  q-bit  codeword  in  the  codebook. 
In  other  words,  a  block  encoder  is  a  dataword-to-codeword 
assignment.  The  corresponding  block  decoder  is  the  inverse 
mapping  or  the  codeword-to-dataword  assignment. 

We  motivate  the  problem  of  interest  using  a  concrete  ex¬ 
ample  of  (d,  k)  =  (0,2)  run-length  limited  (RLL)  constraint 
which  demands  that  runs  of  consecutive  symbols  "0”  must  not 
be  more  than  2.  We  are  interested  in  a  rate  4  :  5  block  code 
for  this  constraint.  A  set  of  valid  5-bit  codewords  for  this 
constraint  can  be  obtained  by  starting  from  all  5-bit  words 
and  eliminating  all  words  that  have  more  them  two  consecu¬ 
tive  symbols  “0”  anywhere  in  the  words  and  by  eliminating 
all  words  that  have  more  them,  one  symbol  “0”  at  the  begin¬ 
ning  or  at  the  end  of  the  word.  This  process  leaves  a  set  of  1 7 
codewords  which  cem  be  freely  concatenated  without  violating 
the  constraint. 

Since  17  >  16  =  24,  these  set  of  codewords  cem  support 
a  rate  4  :  5  block  code.  Thus,  the  problem  is  (i)  to  select 


Datawords 

Codewords 

excess 

10101 

0000 

10010 

0001 

10110 

0010 

10011 

0011 

10111 

0100 

01001 

0101 

01101 

0110 

11001 

0111 

11101 

1000 

01010 

1001 

oiiio 

1010 

01011 

1011 

01111 

1100 

11010 

1101 

11110 

1110 

11011 

1111 

11111 

Table  1:  A  block  encoder  for  the  (0,2)  RLL  constraint. 

an  excess  codeword  and  (ii)  to  determine  a  mapping  from 
the  remaining  16  codewords  to  the  set  of  all  4-bit  datawords. 
There  are  17  choices  for  the  excess  codeword,  and  for  each  such 
choice  there  are  16!  choices  for  the  encoder.  In  all  there  are 
17!  PS  3.5568  x  1014  ways  to  select  a  codebook  and  an  encoder! 
In  other  words,  there  is  a  great  amount  of  freedom  in  selecting 
the  encoder/decoder  pair  to  implement  a  given  block  code. 
In  this  paper,  we  are  interested  in  exploiting  this  freedom  to 
select  an  encoder /decoder  pair  that  has  a  low-complexity  of 
hardware  implementation.  Typically,  given  the  large  number 
of  possibilities,  a  brute-force  search  is  out  of  the  question  for 
even  relatively  low  rate  block  codes.  Currently  such  a  task  is 
performed  in  a  laborious,  ad-hoc,  and  human-centric  fashion, 
and  becomes  nearly  impossible  for  very  high-rate  codes. 

As  our  main  contribution,  we  propose  efficient  heuristic 
computer  algorithms  to  select  a  codebook  and  to  construct 
low-complexity  encoder/decoder;  for  example,  the  encoder  in 
Table  1  was  found  using  the  new  algorithm.  Furthermore,  we 
demonstrate  the  algorithm  using  rate  8  :  9  block  codes  for 
(0,4/4)  and  (0,3/6)  PRML  constraints. 
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I.  Introduction 

This  paper  analyzes  the  distribution  of  cycle  lengths  in 
turbo  decoding  graphs.  It  is  known  that  the  widely-used  it¬ 
erative  decoding  algorithm  for  turbo  codes  is  in  fact  a  special 
case  of  a  quite  general  local  message-passing  algorithm  [1]  for 
efficiently  computing  posterior  probabilities  in  acyclic  directed 
graphical  (ADG)  models  (also  known  as  “belief  networks”)  [2, 
3],  However,  this  local  message-passing  algorithm  in  theory 
only  works  for  graphs  with  no  cycles.  Why  it  works  in  prac¬ 
tice  (i.e.,  performs  near-optimally  in  terms  of  bit  decisions) 
on  ADGs  for  turbo  codes  is  not  well  understood  since  turbo 
decoding  graphs  can  have  many  cycles. 


Fig.  1:  An  example  of  a  turbo  decoding  graph  for  a  K  =  6,  n  =  12, 
rate  1/2  turbocode. 

II.  Method 

The  ADG  model  for  a  turbo-decoder  can  be  reduced  to 
what  we  call  a  turbo  decoding  graph  (Figure  1),  which  is  an 
undirected  graph  capturing  the  inherent  loop  structure  of  a 
turbo  decoder.  There  are  two  parallel  chains,  each  having  n 
nodes  (for  real  turbo  codes,  n  can  be  very  large,  e.g.,  n  = 
64,000).  Each  node  is  connected  (via  a  U  node)  to  exactly 
one  node  on  the  other  chain  and  these  one-to-one  connections 
are  chosen  randomly,  e.g.,  by  a  random  permutation  of  the 
sequence  {1,2,...,  n}. 

To  help  count  the  cycles  in  the  graph,  we  drop  the  U  nodes, 
and  label  the  edges  in  any  simple  cycle  as 

1.  -t:  “Left-to-right  on  a  chain”  (e.g.,  S?  ->  Sj  in  Figure 

1). 

2.  <— :  “Right-to-left  on  a  chain”  (e.g.,  S3  <—  Sj),  or 

3.  =:  “Across  the  chains”  (e.g.,  S3  —  Sj ). 

For  example,  the  cycle  Sf  -  Si  -  S|  -  Sj  -  S4  -  S3  -  S?  will 
be  labeled  — 1— t— <— <— =.  Starting  from  anode  on  a  chain,  and 
a  label  sequence  L  E  {— t,  f— ,=}+,  there  is  at  most  one  cycle 

'This  work  was  supported  in  part  by  NSF  CAREER  award  IRI- 
9703120  and  by  AFOSR  grant  F49620-97-1-0313. 


Fig.  2:  Theoretical  vs.  simulation  estimates  of  the  probability  of 
no  cycles  of  length  k  or  less,  as  a  function  of  k,  in  a  turbo  decoding 
graph  (chain  length  n  =  64,000). 

being  labeled  L.  We  count  the  number  of  cycles  of  length  k 
at  a  node  by  computing 

1.  The  total  number  of  possible  label  sequences  L, 

2.  The  probability  of  finding  a  cycle  with  the  label  se¬ 
quence  L. 

More  complete  details  can  be  found  in  [4]. 

III.  Conclusions 

Using  this  general  approach,  we  estimate  the  probability 
that  there  exist  no  simple  cycles  of  length  <  k  at  a  randomly 
chosen  node  in  a  turbo  decoding  graph.  In  Figure  2,  we  com¬ 
pare  both  analytical  and  simulation  results.  For  turbo  codes 
with  a  block  length  of  64000,  a  randomly  chosen  node  has  a 
less  than  1%  chance  of  being  on  a  cycle  of  length  less  than  or 
equal  to  10,  but  has  a  greater  than  99.9%  chance  of  being  on 
a  cycle  of  length  less  than  or  equal  to  20. 
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Abstract  —  The  error  correction  capability  of  in¬ 
terleaved  linear  block  codes  is  discussed.  We  assume 
that  the  channel  behaves  such  that  each  row  of  a  re¬ 
ceived  array  is  either  error  free  or  corrupted  by  many 
symbol  errors.  Provided  that  the  error  vectors  Eire 
linearly  independent,  we  show  that  some  interleaved 
block  codes  can  correct  asymptotically  one  erroneous 
row  per  redundant  row,  even  without  having  reliabil¬ 
ity  information  from  the  channel  output.  An  efficient 
decoding  algorithm  that  achieves  the  error  correction 
capability  is  presented.  Using  this  algorithm  we  de¬ 
rive  a  random  access  scheme  that  has  many  similar¬ 
ities  with  the  Aloha  system.  This  paper  represents 
a  generalization  of  our  work  [2].  As  it  finally  turned 
out,  many  ideas  from  [2]  were  already  discussed  in 
1990  by  Metzner  and  Kapturowski  [3]. 

I.  Introduction 

Block  interleaving  of  linear  block  codes  is  a  well  known  method 
for  the  correction  of  long  error  bursts.  Therefore  we  arrange  n 2 
codewords  of  an  (m,  fci,  di)-code  as  the  columns  of  an  m  x  n 2 
interleaver  matrix.  Then  the  matrix  is  transmitted  over  the 
channel  row  by  row.  Using  column-wise  BMD-decoding  we  do 
not  exploit  the  knowledge  that  errors  occur  in  bursts  and  that 
only  a  limited  number  of  rows  is  corrupted.  In  this  paper  we 
present  a  decoding  algorithm  that  makes  use  of  these  facts. 

II.  Transmission  scheme 

Each  column  of  the  n.\  x  n-i  block  interleaver  matrix  C  rep¬ 
resents  a  codeword  of  a  given  linear  block  code  Ci(ni,fci,di) 
with  parity  check  matrix  H.  The  symbol  alphabet  corresponds 
to  a  finite  field  denoted  by  A ■  We  will  consider  C  to  be  one 
code  matrix  of  an  n\  x  m  array  code  C.  The  errors  inserted 
by  the  channel  can  be  described  by  an  additive  error  matrix 
F  €  A"1*”2  where  R  =  C  +  F.  According  to  the  parity 
check  matrix  H  we  can  calculate  a  syndrome  vector  for  each 
column  of  the  received  matrix.  Arranging  theses  syndromes 
as  columns  of  a  (ni  —  fci)  x  n2  matrix  we  get  the  so-called 
syndrome  matrix  S,  where  S  =  H  •  R  holds.  It  follows  that 
rank(S)  =  rank(F),  as  long  as  t  <  d\  —  1  is  fulfilled. 

III.  Error  correction  capability 

The  number  of  erroneous  rows  is  the  metric  that  we  use  for 
decoding.  Hence  an  optimal  decoder  tries  to  find  that  code 
matrix,  that  has  as  many  identical  rows  with  R  as  possible.  It 
can  be  shown  that  R  can  be  correctly  decoded  if  the  following 
condition  is  fulfilled: 

t  <  d\  —  1  —  (t  —  rank  (F))  .  (1) 


Hence,  for  linearly  independent  error  vectors  we  can  correct 
<h  —  2  erroneous  rows  without  using  any  soft  information  for 
symbols  or  rows.  It  can  be  shown  for  many  applications  that 
the  matrix  dimensions  can  be  designed  such  that  the  proba¬ 
bility  of  linearly  dependent  error  vectors  becomes  arbitrarily 
small  [2],  Using  an  MDS-code  as  column  code,  this  means  that 
the  corresponding  interleaved  code  can  correct  ni  —  fci  —  1  erro¬ 
neous  rows  without  using  any  reliability  information  from  the 
channel  output,  provided  that  the  error  vectors  are  linearly 
independent. 

IV.  Decoding 

A  decoding  algorithm  can  be  derived  that  actually  achieves 
the  error  correction  capability  for  linearly  independent  error 
vectors  [2]  [3].  The  complexity  of  this  algorithm  has  order 
0(nf  ■  712).  The  algorithm  can  be  generalized  for  the  case  of 
linearly  dependent  error  vectors,  such  that  the  error  correcting 
capability  corresponding  eqn.  1  is  achieved.  The  generalized 
algorithm  has  low  complexity  for  small  values  of  t— rank(F).  It 
can  be  shown  that  the  problem  of  correcting  a  linearly  depen¬ 
dent  error  pattern  is  equivalent  to  finding  a  minimum  weight 
codeword  of  the  code  that  is  defined  by  a  submatrix  of  an 
equivalent  parity  check  matrix  H' . 

V.  Aloha-Like  Random  Access  Scheme  without 
Feedback 

The  Aloha  system  is  a  simple  and  well  known  random  access 
scheme.  Nevertheless  it  requires  collision  detection  and  feed¬ 
back  from  the  receiver  or  channel  to  the  transmitters.  The 
idea  is  to  consider  one  row  of  the  interleaver  matrix  as  one 
received  data  slot  of  a  random  access  scheme  and  to  use  a 
long  RS  code  as  column  code.  It  turns  out  that  the  proposed 
scheme  has  the  same  throughput  as  slotted  Aloha  (1/e)  with¬ 
out  requiring  a  feedback  channel  or  additional  redundancy  for 
error  detection. 
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Abstract  —  We  present  the  performance  limits  of 
concatenated  codes  with  interleaver  of  infinite  size 
and  under  iterative  decoding.  Wc  study  the  propa¬ 
gation  of  the  probabilities  at  the  output  of  the  SISO 
decoder,  and  give  a  general  formula  for  the  density 
propagation  through  iterations. 

I.  Introduction 


resulting  from  the  product  of  d  —  1  independent  extrinsic  in¬ 
formations  supplied  by  the  other  d—  1  subcode  neighbors.  The 
total  APP  is  the  product  of  d  extrinsic  informations  and  the 
initial  observation.  Let  Bm  be  the  partial  a  posteriori  log- 
likelihood  ratio  (LLR)  at  iteration  m  (bit  position  j  omitted): 


8m  =  log 


p(r/c  =  1  )p(c  =  1) 
p(r/c  =  0)p(c  =  0) 


Compound  codes  have  been  extensively  studied  in  the  litera¬ 
ture  [l]-[6] .  All  these  codes  are  decoded  iteratively  since  no 
maximum- likelihood  decoding  algorithm  of  reasonable  com¬ 
plexity  is  available.  Recently,  [7]  and  [8]  presented  a  method 
for  determining  the  performance  limits  of  LDPC  codes  under 
iterative  decoding.  Their  approach  is  based  on  the  estimation 
of  the  probability  density  function  of  the  decoder  output  from 
its  input  density.  In  this  paper,  we  establish  a  general  den¬ 
sity  propagation  formula  available  for  all  isotropic  codes,  i.e., 
when  the  probability  distribution  of  the  a  posteriori  proba¬ 
bility  (APP)  is  independent  of  the  bit  position,  and  we  give 
numerical  results  for  different  compound  codes. 

II.  Isotropy  of  constituent  codes 


The  density  propagation  through  the  graph  can  be  summa¬ 
rized  by  the  following  general  formula 


Bm  —  So  +  (d  -  1)  ® 


ELo^4-  ®  [exP(gm-l)]®‘  r 

Er=0  ^?®[eXp(Rm-l)]®'  . 


The  total  APP  distribution  is  equal  to  the  convolution  of  the 
Bm  density  and  the  extLLRm  density.  Ifpm(:r)  is  the  proba¬ 
bility  density  function  of  LLRm  and  if  the  all  zero  codeword 
has  been  transmitted,  the  bit  error  probability  at  iteration 
m  is  Pem  —  f0+CX  Pm{x)dx.  The  performance  limit  of  the 
iterative  SISO  decoder  is  given  by  the  minimal  value  of  the 
signal-to- noise  ratio  Eb/No  for  which  Pem  tends  to  0  when  m 
goes  to  +oo. 


IV.  Numerical  results 


All  concatenated  codes  can  be  modeled  as  a  graph  having  two 
types  of  nodes  representing  bits  and  subcodes  respectively.  In 
the  sequel,  this  graph  is  assumed  to  be  cycle-free,  i.e.,  the 
length  of  the  interleaver  is  infinite.  Let  C(n,k)  be  the  lin¬ 
ear  binary  constituent  block  code.  The  APP  associated  to  a 
coded  bit  Cj,  j  =  1 ...  n,  can  be  written  as  a  function  of  the 
conditional  weight  enumerator  (Aci~  ,  );,j=i . n  : 

n 

APP{Cj)  oc  y^Ac/t  ®  [p(r/c)p(c)]®'[p(r/c)p(c)]®n-' 

i=0 


The  following  table  summarizes  the  thresholds  of  differ- 


ent  compound  codes,  obtained  by  a  Monte  Carlo  method. 


PCCC  [3]  R  —  0.5  C  =  (13,15) 

0.58dB 

SCCC  [4]  R  =  0.5  Gi  =  (17, 6, 15),  C2  =  (31, 25, 33, 37) 

0.87dB 

Block  GLD  [6]  R  =  0.5  C  =  (15, 11) 

0.83dB 

Convolutional  GLD  [5]  R  =  0.5  C  =  (13, 15,  2, 14) 

0.85dB 
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where  r  is  the  received  symbol,  c  being  transmitted,  and 
p(re/cc)p(ci),£  =  1 ,...,«  are  identically  distributed.  X  +  Y 
and  XY  are  respectively  denoted  2®  X  and  X®2  when  X  and 
Y  are  identically  distributed.  If  the  probability  distribution 
of  the  APP  information  is  independent  of  the  bit  position,  the 
constituent  code  is  said  to  be  isotropic.  All  bits  in  the  graph 
are  then  equally  protected  by  the  information  propagation. 
For  example,  cyclic  codes  and  extended  BCH  primitive  codes 
are  isotropic  codes. 

III.  Log-likelihood  ratio  density  propagation 

Let  us  describe  the  information  propagation  in  the  graphi¬ 
cal  model  of  the  concatenated  code,  d  (resp.  n,  the  code 
length  or  a  restricted  window  containing  the  local  constraints 
for  a  convolutional  code)  is  the  degree  of  the  bit  node  (resp. 
the  subcode  node).  The  constituent  code  is  assumed  to  be 
isotropic.  A  subcode  node  computes  an  extrinsic  information 
extLLRm  from  its  n  —  1  inputs.  A  bit  node  evaluates  its  a 
posteriori  probability  LLRm,  combining  the  channel  observa¬ 
tion,  the  extrinsic  information,  and  the  a  priori  probability 


References 

[1]  R.G.  Gallager,  “Low-density  parity-check  codes,”  MIT  Press, 
1963. 

[2]  R.M.  Tanner,  “A  recursive  approach  to  low  complexity  codes," 
IEEE  Trans,  on  Information  Theory ,  vol.  27,  Sept.  1981. 

[3]  C.  Berrou,  A.  Glavieux,  P.  Thitimajshima,“Near  Shannon 
Limit  Error-Correcting  Coding  and  Decoding:  Turbo-Codes," 
ICC’93,  Geneve,  May  1993. 

[4]  S.  Benedetto,  G.  Montorsi,  D.  Divsalar,  F.  Pollara,  “Serial  con¬ 
catenation  of  interleaved  codes:  Performance  analysis,  design 
and  iterative  decoding,”  TDA  Progress  Report  4^-IS6,  1995. 

[5]  S.  Vialle,  J.  Boutros,  “A  Gallager- Tanner  construction  based 
on  convolutional  codes,”  WCC’99,  Paris,  Jan.  1999. 

[6]  J.  Boutros,  O.  Pothier,  G.  Zemor,  “Generalized  Low  Density 
(Tanner)  Codes,”  ICC’99,  Vancouver,  June  1999. 

[7]  A.J.  Felstrom  and  K.Sh.  Zigangirov,  “Time  varying  periodic 
convolutional  codes  with  Low-Density-Parity-Check  matrix," 
IEEE  Trans,  on  Information  Theory,  vol.  45,  Sept.  1999. 

[8]  T.  Richardson,  R.  Urbanke,  “The  capacity  of  low-density  parity 
check  codes  under  message-passing  decoding,”  Bell  Labs  report, 
Nov.  1998. 


0-7803-5857-0/00/S1  0.00  ©2000  IEEE. 


150 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


On  the  Training  Distortion  of  Vector  Quantizers 

Tamas  Linder 

Dept,  of  Mathematics  &  Statistics 
Queen’s  University 
Kingston,  Ontario,  Canada  K7L  3N6 
email:  linderQmast.queensu.ca 


Abstract  —  The  in-training-set  performance  of  a  vec¬ 
tor  quantizer  as  a  function  of  its  training  set  size  is  in¬ 
vestigated.  For  squared  error  distortion  and  indepen¬ 
dent  training  data,  worst-case  type  upper  bounds  are 
derived  on  the  minimum  training  distortion  achieved 
by  an  empirically  optimal  quantizer.  These  bounds 
show  that  the  training  distortion  can  underestimate 
the  minimum  distortion  of  a  truly  optimal  quantizer 
by  as  much  as  a  constant  times  n  1//2,  where  n  is  the 
size  of  the  training  data.  Earlier  results  provide  lower 
bounds  of  the  same  order. 

I.  Introduction 

A  d-dimensional  fc-point  vector  quantizer  Q  is  a  (measur¬ 
able)  mapping  of  Rd  into  a  finite  set  of  points  {t/i, . . .  ,yk}, 
called  the  codebook.  Let  Qk  denote  the  family  of  all  d- 
dimensional  fc-point  vector  quantizers.  Given  a  d-dimensional 
random  vector  X  with  distribution  fix ,  a  quantizer  Q*  €  Qk 
is  called  an  optimal  fc-point  quantizer  for  fix  if  it  has  minimum 
mean  squared  distortion  in  Qk'. 

D(Q-)  =  E[\\X-Q*(X)\\2]  =  min  E[||X  -  Q(X)||2] . 

Assume  that  a  quantizer  is  to  be  designed  on  the  basis  of 
the  training  data  X\,  X2,  ■  ■  ■  ,  Xn  consisting  of  n  vectors  inde¬ 
pendently  drawn  according  to  fix  ■  In  general,  the  objective  of 
a  quantizer  design  algorithm  (such  as  the  generalized  Lloyd  al¬ 
gorithm)  is  to  find  an  empirically  optimal  quantizer  Q*n  £  Qk 
whose  distortion  in  quantizing  the  training  data  is  minimum: 

Dn{Qn)=  min 

l  =  l 

The  random  quantity  Dn(Q*)  is  called  the  training  distor¬ 
tion  of  Qn .  Since  the  training  distortion  is  obtained  as  a 
by-product  of  the  design  procedure  without  requiring  addi¬ 
tional  test  data,  it  can  be  considered  an  inexpensive  estimate 
of  D{Q*).  It  is  easy  to  see  that  D{Q*n)  is  optimistically  bi¬ 
ased  in  the  sense  that  E[Dn(Qn)]  <  D(Q*)  (the  inequality 
is  strict  whenever  D{Q")  >  0).  The  size  of  the  bias  was 
first  investigated  in  a  work  by  Kim  and  Bell  [1]  who  showed 
that  E[Dn(Q'n )]  <  D(Q*){  1  —  l/n)  for  any  source  distribu¬ 
tion  with  a  finite  second  moment.  Our  main  result  shows  that 
this  bound  can  be  considerably  improved  in  a  worst  case  sense: 
the  difference  D(Q*)  —  E{Dn(Q'n)]  of  the  minimum  distortion 
of  an  optimal  quantizer  and  the  expected  training  distortion  of 
the  empirically  optimal  quantizer  can  be  as  large  as  constant 
times  n-1/2. 

II.  Minimax  bounds  on  the  Training  distortion 

Let  V(B)  denote  the  class  of  all  source  distributions  which 
satisfy  the  peak  power  constraint  P{(l/d)||Aj|2  <  B }  =  1. 

This  research  was  supported  in  part  by  the  Natural  Sciences 
and  Engineering  Research  Council  (NSERC)  of  Canada. 


In  other  words,  for  any  B  >  0,  the  class  V(B)  consists  of 
all  source  distributions  whose  support  is  contained  in  the  ball 
{x  :  ||x||  <  VdB  }. 

Theorem  1  For  any  quantizer  dimension  d>  1  and  codebook 
size  fc  >  3  there  exists  a  source  distribution  fix  €  F(B)  such 
that  for  all  training  set  size  n  >  | fc, 

E[Dn{Q*n)}<D{Q')-C^^ 

j  4 

where  c(B,  d,  fc)  = 

If  the  relative  difference  is  considered,  the  following  simple 
bound  can  be  obtained  in  terms  of  the  training  ratio  0  =  n/k. 

Theorem  2  For  any  quantizer  dimension  d  >  1  and  codebook 
size  fc  >  3  there  exists  a  source  distribution  fix  £  P(P)  such 
that  for  all  training  set  size  n  >  § fc, 

E[Dn(Qn)\  <D{Q')(\--^ 
where  Co  =  \\J\  ~  0.27. 

Note  that  in  the  above  bounds  the  “bad”  source  distribu¬ 
tion  giving  a  large  bias  does  not  depend  on  the  training  data 
size  n.  Thus  Theorem  1  guarantees  the  existence  of  at  least 
one  fixed  source  distribution  in  V{B)  such  that 

liminf  ^(d(Q*)  -  E[Dn(Q0)]']  >0. 

n— *00  y  ,  J 

In  contrast,  the  worst  case  bound  developed  in  [2]  on  the 
test  distortion  of  an  empirically  optimal  quantizer  is  obtained 
by  constructing  a  different  “bad”  source  distribution  for  each 
training  data  size  n. 

Using  an  earlier  result  [3],  it  can  be  shown  that  Theorem  1 
is  essentially  tight.  We  can  conclude  that  for  all  fc  >  3  and  all 
n  large  enough, 

4=  <  sup  (d(q*) -£[£„(<?;;)])  <  4 

Qn  ilx€V(B)\  J  Qn 

for  some  constants  c,  c  >  0  depending  on  d,  fc,  and  B. 
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Abstract  —  We  present  a  new  asymptotic  quanti¬ 
zation  theory  on  a  plane  for  a  known  smooth  non- 
uniform  data  density.  Based  upon  a  two-stage  model, 
we  design  a  mapping  for  harmonic  cluster.  We  argue 
angular  phase  field.  We  give  a  relative  distortion  mis¬ 
match  for  a  case  of  asymptotic  clusters,  and  optimize 
over  the  cluster  centers. 

I.  Introduction 

When  ||  ||J  represents  the  s-th  power  ( s  >  0)  of  the  Euclidean 

distance,  the  integral 

/  ~  (!) 
JR 2  i<-<w 

measures  a  performance  of  an  N-quantizer,  yu...,yN,  of  a 
random  point  from  a  smooth  probability  density  p(x)  on  a 
plane.  We  study  an  asymptotic  geometry  of  a  near  optimal 
quantizer  when  N  is  large  enough.  Let  [3  =  s/k  and  p  = 
2/(s  +  2),  and  let  ||  p  ||p  denote  {J p(x)/>}Dp)  then  it  is  well 
known  that 


min  (1)  ~  IV-'3f?2.»||  p  ||  ,  (2) 

t  VI . VNl  H 

where  f?2,s  is  the  normalized  s-th  moment  of  the  regular 
hexagon.  This  result  holds  under  a  mild  regularity  condition 
on  p ,  including  the  moment  condition  f  ||jj|s+’p(.r)d.r  <  oo 
for  any  £  >  0.  On  the  one  hand  the  result  means  that  the 
optimal  quantizer  have  a  density  proportional  to  pp(.r).  and 
on  the  other  hand  each  point  in  the  quantizer,  call  which  a 
generator,  have  a  Voronoi  region  being  almost  similar  to  the 
regular  hexagon.  Only  a  few  [1][2][3][4]  study  this  seemingly 
contradictory  facts.  We  continue  them  and  propose  a  new 
asymptotic  approach  in  the  design  of  two-stage  quantizer. 

II.  Results 

Define  g(x)  :=  pp /  J ppdx ,  thus  Ng  is  the  optimal  number 
density  of  generators.  We  identify  V.2  as  a  complex  plane  C. 
Let  C  be  decomposed  into  domains  C  =  |J^  Z/£,  where  U£  is 
indexed  by  some  central  point  £.  We  design  a  compressor, be. 
a  mapping,  <p(s;  £)  from  the  distribution  space  Lg  (parameter¬ 
ized  by  z  —  zi  +ij2)  to  the  quantization  space  U(parameterized 
by  ((>),  such  that  <p(£;£)  =  0. 

At  first  let  l(z)  :=  lnp(s),  and  define  a  holomorphic  func¬ 
tion 

a  ■■=  m  +  (*  -  o am  +  l(z  -  o292/(o,  (3) 

where  d  =  -  i^-.  Using  this  function  we  define 

g(z)  :=  cf|eL(:;t)|  for  c  €  U(,  (4) 

*A  part  of  the  work  was  done  while  the  author  stayed  in  1996 
at  Information  Systems  Laboratory,  Dept,  of  EE,  Stanford  liniv. 


where  the  normalizing  constant  c£  is  determined  such  that 
g(U()  =  g(U(). 

For  a  phase  8(£)  6  [0,  27t]  given  at  £,  we  design  the  com¬ 
pressor  by  the  complex  integral 


-i; 


~  0(Oi}dz 


(5) 


The  inverse  image  of  this  function  of  a  hexagonal  lattice 
spanned  by  A  and  Ae^*1,  with  the  lattice  constant  A  =  -n=-^= , 

.  #  v3  vN 

approximate  the  optimal  quantizer.  We  can  also  argue  that 
the  angular  phase  6(()  satisfies  the  partial  differential  equa¬ 
tion: 

j=i 

through  our  two-stage  model,  where  e,j  being  of  Eddington. 
We  can  also  verify  this  by  experiments[4). 

Define  the  optimal  distortion  as 

Doo  :=  N  3 Rj.s  f  g(.r)~:1p(x)dx.  Let  Ng(.r)  be  the  actual 
quantizer  number  density  defined  as  above  and  let  it  yield 
a  distortion  Dg.  Then  the  relative  distortion  mismatch  can 
be  formulated  as  follows.  We  assume  that  N  is  large  enough 
while  the  number  of  partitioned  domain  is  finite. 

Fact 


/3(/J  +  l)r.,IMri  , 

- - -  /  A  variance  of 

t 

i||.r-£||2  Al(0  with'MUt).}  (7) 

where  g(-\U()  is  a  conditional  distribution  of  g  in  U(,  and.  the 
asymptotics  hold  when  the  diameters  of  U(S  are  sufficiently 
small,  and  A  represents  the  Laplacian  operator. 

Both  ‘domain  effect’  and  ‘boundary  effect’  can  contribute 
to  the  actual  relative  distortion  mismatch.  When  the  cluster 
centers  have  a  number  density  Nqk(z),  where  f  k(z)  =  1,  and 
if  1  <<  Nq  <<  A1/5,  and  also  under  a  working  assumption 
that  the  cluster  centers  form  a  Voronoi  diagram  with  each 
Voronoi  cell  being  almost  regular  hexagon,  then  the  formula 
(7)  takes  the  following  minimum: 


Dg  ~  Do 
D * 


-  Rl  ,)^-p{  j  p(  x  )pD  ( A  In  p{.r )  )2^3  }3 ,  (8) 
when  k(x)  =  const. p(.r)p/3{  A  lnp(;r))2/3. 
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Abstract  —  A  common  belief  in  quantization  the¬ 
ory  says  that  the  quantization  noise  process  resulting 
from  uniform  scalar  quantization  of  a  correlated  dis¬ 
crete  time  process  tends  to  be  white  in  the  limit  of 
small  distortion  (“high  resolution”).  We  show  that 
the  quantization  errors  resulting  from  independent 
non-uniform,  vector  quantizations  of  dependent  real 
random  vectors  become  asymptotically  uncorrelated 
if  the  joint  Fisher  information  under  translation  of  the 
two  vectors  is  finite  and  the  quantization  cells  shrink 
uniformly  as  the  distortion  tends  to  zero. 

I.  Introduction 

The  Asymptotic  Whiteness  Property  (AWP)  of  the  quanti¬ 
zation  error  process  [2,  sec.  5.6]  says  that  the  quantization 
noise  process  resulting  from  uniform  scalar  quantization  of  a 
correlated  discrete  time  process  tends  to  be  white  in  the  limit 
of  small  distortion  (“high  resolution”).  The  AWP  also  gives 
interesting  insight  into  the  behavior  of  multiterminal  coding 
of  correlated  continuous  sources  [3],  where  the  correlation  be¬ 
tween  the  errors  at  separate  terminals  may  affect  the  esti¬ 
mation  error  at  the  centralized  decoder.  Our  main  result  in 
this  paper  generalizes  the  AWP  to  non-uniform  quantization. 
Unlike  lattice  quantization,  in  this  case  the  quantization  cells 
are  not  necessarily  convex,  and  may  be  even  unions  of  discon¬ 
nected  regions,  as  happens  in  the  case  of  multiterminal  source 
coding  [3],  However,  while  a  sufficient  condition  for  AWP  for 
vector  lattice  quantization  is  that  the  pair  X„  and  Xn+k  have 
a  joint  probability  density  function  and  finite  power,  the  more 
general  formulation  of  the  AWP  requires  stronger  conditions 
on  the  joint  distribution  of  ( Xn ,  Xn+k)- 

The  intuition  behind  the  AWP  comes  from  the  combination 
of  two  ideas: 

1.  Local  uniformity:  If  the  joint  distribution  of  the 
source  samples  is  “smooth”,  then  it  is  approximately 
uniform  inside  small  cells  (corresponding  to  high  reso¬ 
lution  quantization). 

2.  Rectangular  partition:  Independent  quantization  of 
random  variables  X  G  X  and  Y  €  y  induces  a  rectan¬ 
gular  (“Cartesian”)  partitioning  of  the  (A,  30-plane. 

The  property  of  rectangular  partition  above  seems  simple 
and  clear.  The  main  purpose  of  this  paper  is  to  make  a  pre¬ 
cise  statement  of  the  idea  of  local  uniformity,  to  propose  a 
sufficient  condition  for  it  to  hold  and  to  prove  a  general  form 
of  the  AWP  using  the  local  uniformity  condition.  For  lat¬ 
tice  quantization  existence  of  the  joint  probability  density  of 
the  source  turns  out  to  be  sufficient.  ’For  general  non-uniform 
quantization  our  condition  is  based  on  the  finiteness  of  the 
Fisher  Information  under  translation  [1],  a  quantity  which  is 
a  function  of  the  joint  distribution  of  the  source  samples  and 
a  moment  condition  defined  below  (2). 


II.  Summary  of  Results 
Let  X  G  X,  Y  G  y,  where  X  =  y  =  Tlk,  be  random  vectors 
with  joint  density  p(x,y).  Let 

*(x):  *— >{i,2,...,jva},  j(y)  ■■  y->{i,2,...,Nv} 

induce  two  partitions  of  7Zk  corresponding  to  independent 
quantization  of  X  and  Y,  respectively.  Let  (x,  y)  = 

<3^(x),  j(y)^  denote  the  quantizer  reconstruction.  We  de¬ 
fine  Q(i,j)  to  be  the  joint  centroid  of  the  cell  relative  to  the 
source  distribution. 

Consider  a  sequence  of  pairs  of  partition  functions 
iv(x),  jfjv(y)  of  X,y,  N  =  1,2,.. .,  and  a  corresponding  se¬ 
quence  of  reconstruction  functions  (x/v,yw),  such  that 


Dx,n  ±  £||X  — XjvII2  -f  0,  Dy,N  4  £||Y  — Yw||2  -4  0.  (1) 


at  the  same  rate.  Assume  that  there  exists  some  <5  >  0  such 
that 


i+« 


<  oo. 


lim  sup  E 

(2) 

Define  the  joint  Fisher  Information  (FI)  under  translation  of 
(X,  Y)  [1]  as 

|2 


J(X,Y) 


[  1  . 

<9p(x,y) 

/  p(x,y) 

d(x,y) 

dxdy  . 


Theorem  1  Let  (X,Y)  G  (X,y),  where  X  =  y  =  Hk, 
be  correlated  random  vectors  with  continuous  source  density 
p(x,y),  a.s.  continuously  differentiable  In p(x,y)  and  joint  FI 
J(X,Y)  <  oo.  Let  iat(x)  and  j n (y)  be  a  sequence  of  inde¬ 
pendent  partition  functions  of  X  and  y ,  let  xjv  and  yjv  be  the 
corresponding  reconstructions,  and  let 


a  £{(X  -  XjO*(Y  -  Yjv)} 

pN  = -  — - 

Dx,N  Ey  ,N 


(3) 


be  the  correlation  coefficient  between  the  quantization  errors. 
If  the  sequence  {xn,9n)  satisfies  (1)  and  (2),  then 

pN  -4  0  as  N  -4  oo.  (4) 
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Abstract  —  We  show  that  there  can  be  an  arbitrary 
discrepancy  between  the  worst-case  rate  required  for 
scalar  and  vector  quantization.  Specifically,  that  for 
every  5 ,  however  large,  and  every  e  >  0,  however 
small,  there  is  a  random  variable  and  a  distortion  mea¬ 
sure  where  quantization  of  a  single  instance  within 
a  given  distortion  requires  more  than  5  bits  in  the 
worst  case,  but  quantization  of  multiple  independent 
instances  within  the  same  distortion  requires  at  most 
e  bits  per  instance  in  the  worst  case.  Furthermore, 
these  discrepancies  can  be  achieved  by  simple  distor¬ 
tion  measures  that  attain  just  two  values:  0  and  oo. 

I.  Summary 

The  results  follow  from  a  judicious  application  of  the  fol¬ 
lowing  examples. 

Example  1  (Mail  order,  see  Slepian,  Wolf,  and  Wyner  [1] 
for  average-case  analysis.)  A  mail-order  firm  sells  n  different 
shirts.  Experience  has  shown  that  each  customer  likes  m  of 
the  n  shirts  and  wants  to  get  just  one  of  them.  For  example,  a 
customer  may  like  all  m  blue  shirts  and  have  no  preference  for 
one  blue  shirt  over  another,  while  another  customer  may  want 
to  buy  any  one  of  the  m  shirts  designed  by  Giorgio  Armani. 

The  firm  designs  a  new  order  form.  It  would  like  to  know 
the  shortest  length  of  the  reply  field  which  the  customer  fills 
out  to  request  one  of  his  m  favorite  shirts.  In  other  words,  the 
firm  is  interested  in  L{n,m),  the  smallest  number  of  bits  the 
customer  must  specify  for  the  “worst”  set  of  m  shirts.  Note 
that  n  and  m  are  known  in  advance  and  the  only  uncertainty 
is  which  set  of  m  shirts  the  customer  likes. 

For  example,  if  m  =  1  every  customer  likes  exactly  one 
shirt  and  wants  to  get  it.  Clearly  the  shirt  must  be  completely 
specified,  so  L(n,  1)  =  [log  n] .  On  the  other  extreme,  if  m  =  n 
each  customer  likes  all  n  shirts  and  the  firm  can  mail  him  any 
of  them.  Hence  no  bits  need  to  be  transmitted,  so  L(n,  n)  =  0. 

One  can  show  (proof  in  full  version)  that  in  general, 

(1)  L(n,  m)  =  [log(n  —  m  +  1)] .  □ 

Next  we  consider  independent  repetitions  of  the  previous 
scenario  and  compare  the  number  of  bits  required  by  treating 
each  case  individually  to  their  combined  treatment. 

Example  2  (Multiple  mail  orders.)  The  mail-order  firm 
expands  into  k  product  lines.  In  addition  to  shirts  it  now  sells, 
say,  pants,  shoes  and  ( k  —  3)  other  product  lines.  Again,  all 
customers  exhibit  the  same  buying  pattern:  Every  customer 
considers  all  k  product  lines.  In  each  line  the  customer  likes  m 
items  and  wants  to  receive  one.  There  is  no  relation  between 
the  items  liked  in  different  product  lines. 

For  example,  a  customer  may  like  all  m  striped  shirts,  all 
m  pants  whose  catalog  number  is  a  prime,  and  so  on  for  the 
other  lines.  He  then  wants  to  get  one  striped  shirt,  one  prime- 
numbered  pair  of  pants,  etc. 

Supported  by  NSF  Grant  #CCR-9815018. 


We  are  interested  in  Lk{n,  m),  the  number  of  bits  the  cus¬ 
tomer  must  transmit  in  the  worst  case.  No  errors  are  toler¬ 
ated,  so  the  customer  always  receives  k  products,  one  from 
each  line,  and  likes  all  of  them.  By  definition,  L\(n,  m)  = 
L(n,  m).  We  would  like  to  know  how  Lk(n,  m)  grows  with  k. 

By  treating  each  product  line  separately  and  describing  the 
smallest-numbered  desirable  item  in  each  line,  we  see  that 

Lk(n,m)  <  [log(n  -m- f  1)*]  fa  fc[log(n  -  m  +  1)]  =  kL(n,m) 

Since  the  sets  of  desirable  items  in  different  product  lines  (say 
shirts  and  pants)  axe  completely  independent  of  each  other, 
knowing  one  set  conveys  no  information  about  the  other. 
One  could  therefore  be  tempted  to  believe  that  this  upper 
bound  is  tight,  and  only  roundoff  bits  ([&log(n  -  m  +  l)j  vs. 
fcflog(n  -  m  +  1)])  can  be  saved.  This  is  not  the  case.  We 
show  that  for  every  integers  m  <n  and  k, 

(2)  Lk  (n,  m)  <  k  log  —  +  log  n  +  log  k. 

m 

The  proof  is  similar  to  one  used  in  Alon  and  Orlitsky  [2] 
and  will  be  provided  in  the  full  version  of  this  paper. 

To  gain  intuition  about  this  result,  suppose  first  that  n  is 
even  and  m  =  n/ 2.  Namely,  each  customer  likes  half  the  items 
in  each  line.  Specifying  one  item  takes 

L(n,|)  =riog(n-  5  +  l)l  =  nog(£  +  l)l 

bits.  For  multiple  lines,  the  customer  can  describe  each  line 
separately  using  \k  •  log(n/2  +  1)1  >  k  ■  (logn  -  1)  bits.  How¬ 
ever,  Inequality  (2)  shows  that  the  number  of  bits  needed  is 

Lk(n,n/ 2)  <  k  •  log  +  log n  +  log  k  =  log n  +  k  +  logfc. 

It  follows  that  while  the  first  product  line  takes  logn  -  1 
bits  to  describe,  the  second  product  line  requires  at  most 
two  additional  bits,  and  subsequent  lines  add  even  fewer 
bits.  In  the  limit,  the  number  of  bits  per  line  is  only 
limfc-*oo(logn  +  k  +  log  k)/k  =  1.  Significantly  less  than  the 
log  n  —  1  bits  per  line  needed  to  describe  each  line  separately. 

Returning  to  the  general  case  of  Inequality  (2),  we  see  that 
after  the  initial  log(n  —  m  +  1)  bits,  additional  product  lines 
require  about  log  ^  bits  per  line.  Consequently,  for  every 
(5,  however  large,  and  every  e  >  0,  however  small,  one  can 
choose  m  and  n  so  that  a  single  line  would  require  >  S  bits 
while  multiple  lines  would  need  <  e  bits  per  instance.  □ 

The  average-case  analysis  of  Example  2  will  be  carried  out 
in  the  full  version  of  this  paper. 
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Abstract  —  In  this  paper,  we  show  maximal  rates 
in  the  case  that  random  number  generators  generate 
a  random  sequence  with  an  arbitrary  prescribed  dis¬ 
tribution  from  a  random  sequence  with  an  arbitrary 
given  distribution. 

I.  Introduction 

One  of  generalizing  the  random  number  generation  problem 
is  to  relax  the  requirement  that  the  target  random  numbers 
should  be  generated  exactly  according  to  the  prescribed  distri¬ 
bution.  We  are  especially  concerned  with  the  case  of  the  fixed 
length  random  number  generation.  Let  X  and  y  be  countable 
infinite  set.  Let  us  define  a  general  source  as  an  infinite  se¬ 
quence  X  =  {X"}^L1  of  n-dimensional  random  variable  Xn 
taking  value  in  Xn  and  Y  =  {Ym}^=1  of  m-dimensional  ran¬ 
dom  variable  Ym  taking  value  in  Ym. 

In  this  paper,  we  shall  investigate  into  maximal  rate  in 
the  case  that  random  number  generators  generate  a  random 
sequence  with  an  arbitrary  prescribed  distribution  from  a  ran¬ 
dom  sequence  with  an  arbitrary  given  distribution  in  the  sense 
of  vanishing  variational  distance.  The  variational  distance  be¬ 
tween  two  distributions  Pz  and  Pz  on  Z  is  defined  as  follows 

d(Z,Z)  =  £\Pz(z)-Ps(z)\.  (1) 

In  this  setting,  there  are  two  types  of  the  case  for  the  fixed 
length  random  number  generation.  One  is  that  every  source 
rim  symbol  realization  is  deterministically  transformed  into  a 
sequence  with  length  m  where  rcm  depends  only  on  m.  The 
other  is  that  every  source  n  symbol  realization  is  determinis¬ 
tically  transformed  into  a  sequence  with  length  m„. 

II.  Formulation  of  the  problem 

Definition  II.  1  R  is  called  a  type  A  achievable  rate  for  the 
source  X  and  Y  if  there  exists  a  sequence  of  mappings  < pn  : 
Xn  — ►  ym"  such  that 

liminf  >  R  (2) 


lim  d  (ym“ ,  ipn  (-Y™))  =  0.  (3) 

n— *■  oo 

Moreover  the  supremum  of  R  that  are  type  A  achievable  rate 
for  the  source  X  and  Y  is  denoted  by  Sa  (X,  Y)  which  we  call 
maximal  type  A  achievable  rate. 

Definition  II.  2  R  is  called  a  type  B  achievable  rate  for  the 
source  X  and  Y  if  there  exists  a  sequence  of  mappings  <*5m  : 
Xnm  — >  ym  satisfying  the  condition  that  nm  and  m  replace 
n  and  m„  respectively  in  Formula  (2)  and  (3).  Moreover  the 
supremum  of  R  that  are  type  B  achievable  rate  for  the  source 
X  and  Y  is  denoted  by  Sb  (X,  Y)  which  we  call  maximal  type 
B  achievable  rate. 

’This  research  was  supported  in  part  of  Waseda  University  under 
Grant  99A-551  for  Special  Research  Projects. 


III.  Main  Results  ^ 

We  denote  the  limsup  in  probability  of  log  ^  z  „  ,j- 

and  the  liminf  in  probability  of  that  by  H( Z)  and  H_( Z),  re- 
spectively[l][2][3].  Then  we  have 


Theorem  III.  1 


H( Y)  -  v  Y) 


H(X).  (H(X)  H(X)\ 

-  B  (  ’ Y)  - mm  Um  ■ ■  m?) )  ■  (5) 

We  notice  that  if  either  source  X  or  source  Y  satisfies  the 
strong  converse  property^ 3],  then 


Sa  (X,  Y)  =  SB  (X,Y)  =  =J-(.  (6) 

h  (Y) 

In  the  case  that  source  Y  is  uniform  distribution,  i.e., 
Py  (Y)  =  1/M  (M  <  oo),  by  replacing  mn  with  log  Mn  in 
Formula  (2)  of  definition  II.  1  ,  it  is  equivalent  to  the  intrinsic 
randomness  problem  defined  by  Vembu  and  Verdu[l].  Then, 

SA  (X,  Y)  =  H_(X) ,  (7) 

where  Mn  =  Mmn .  On  the  other  hand,  in  the  case  that  source 
X  is  uniform  distribution,  i.e.,  Px  ( X )  =  1/M ,  by  replacing 
nm  with  log  Mm  in  definition  II.  2  ,  the  minimum  of  reciprocal 
number  of  type  B  achievable  rate  is  equivalent  to  the  minimal 
achievable  resolvability  rate  defined  by  Han  and  Verdu[2],  i.e., 

S^Y)=^’  <8> 

where  Mm  =  Mn™. 

For  the  reasons  stated  above,  essence  of  which  maximal 
achievable  rate  is  uniquely  decidable  is  that  either  source  X 
or  source  Y  satisfies  the  strong  converse  property.  Since  uni¬ 
form  distribution  satisfy  the  strong  converse  property,  Both 
maximal  achievable  intrinsic  randomness  rate[l]  and  minimal 
achievable  resolvability  rate[2]  are  the  special  case  of  theorem 
III.  1  . 

IV.  Conclusion 

We  have  defined  two  types  of  random  number  generation 
problem  and  obtained  two  maximal  achievable  rates.  Both 
intrinsic  randomness  problem[l]  and  resolvability  problem[2] 
are  the  special  case  of  our  result. 
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Abstract  —  We  describe  am  information-spectrum 
approach  to  rate-distortion  function  with  side  infor¬ 
mation  at  the  decoder  for  the  general  class  of  non¬ 
stationary  and/or  nonergodic  sources,  where  the  dis¬ 
tortion  measure  is  arbitrary  and  may  be  nonadditive. 
We  establish  a  general  formula  for  the  rate-distortion 
function  of  the  Wyner-Ziv  problem  [1]  for  the  general 
sources  with  the  maximum  distortion  criterion  under 
fixed-rate  coding. 

Let  us  define  a  general  source  X  as  an  infinite  sequence 
X  —  { Xn  =  (Xj"\  •  • ,  A’nn^)}^_1  of  n- dimensional 

random  variables  Xn,  where  each  component  random  vari¬ 
able  xfn\  (1  <  i  <  n)  takes  values  in  countably  infinite  sets 
X  that  we  call  the  source  alphabets.  We  use  the  convention 
defined  in  Han[2].  We  consider  the  class  of  correlated  sources 
X  =  {Xn}^L1,Y  =  {Yn}5JL1  that  are  quit  general.  We  use 
X  as  the  source  for  encoder  and  use  Y  as  the  side  information 
for  decoder  as  shown  in  Fig.l. 

In  order  to  define  a  distortion  measure,  we  need  to  spec¬ 
ify  another  countably  infinite  set  X,  which  is  called  the  re¬ 
production  alphabet.  Then,  d„(x,  x)  is  called  the  distortion 
between  x  G  Xn  and  x  G  Xn,  dn  :  Xn  x  X"  — *  [0,  oo), 
and  the  normalized  distortion  is  bounded  by  dm ax  such  that 
£dn(x,  x)  <  dmax  for  all  x  G  Xn,x  G  Xn.  Furthermore,  let 
us  consider  any  reproduction  process  X  of  n-dimensional  ran¬ 
dom  variables  Xn .  Moreover,  we  need  the  concept  of  “limsup 
in  probability”.  For  any  sequence  i  of  random  vari¬ 

ables,  the  infimum  of  a  such  that  limn^oo  Pr{A„  >  a}  =  0 
is  called  the  limsup  in  probability  of  {An}JT=i  and  is  indi¬ 
cated  by  j>-limsup„._,00  An.  Then  we  consider  the  sequence 
of  the  normalized  distortions  {idn(Xn,Xn)}^i,  and  the 
limsup  in  probability  of  which  is  denoted  by  D(X,  X),  i.e., 
D(X, X)  =  p-lim supn_<00  ±d„(Xn,Xn). 

A  code  is  defined  by  two  mappings:  Encoder  <pn  :  Xn  — ♦ 
Xfcn  and  Decoder  ipn  :  x  j"  ^  Xn,  where  2k„  — 

{1, 2,  •  •  • ,  fcn}.  The  limit  superior  of  the  code  length  per  source 
letter  limsupn_(00  £  log  \(pn\  is  called  the  rate  of  the  encoder 
tfin,  where  \<fin\  denotes  the  cardinality  of  range  of  <p„. 

For  given  general  source  X  and  distortion  D,  a  pair  R  is 
called  achievable  with  side  information  Y  if  there  exists  a  code 
( <fin,i>n )  such  that  p-lim sup^^  ±dn(Xn,ipn(Yn,<fin(Xn))) 
<  D  and  limsupn_>0C1  X  log \<p„\  <  R.  Moreover,  R(D)  = 
inf{jR|f?  is  achievable  with  side  information  for  given  D) . 

In  order  to  give  the  characterization  of  the  general 
rate-distortion  functions,  we  define  the  mutual  information 
spectrum-sup.  Given  any  three  correlated  processes,  X  = 
{X"}~=i,  Y  =  {Fn}~=1  and  Z  =  {Z"}~  lf  we  define  the 


Figure  1:  Wyner-Ziv  type  communication  system. 


sequence  of  the  normalized  information  densities 


fl  PzT'\xnYn{Zn\XnYn) "[  °° 

\n  °g  Pzn|yn(Z"|Y«)  fn=i’ 


(1) 


where  we  use  the  convention  that  Py\x  denotes  the  con¬ 
ditional  probability  distribution  of  Y  given  X.  Then  the 
limsup  in  probability  of  (1)  is  denoted  by  /(X;Z|Y),  i.e., 
7(X;Z|Y)  =  p-lim sup„^oc  £  log  which 

we  call  the  conditional  mutual  information  spectrum-sup. 


Theorem  1  For  given  X,  Y  and  D, 

R(D)  =  inf  7(X;  Z|Y), 

where  inf  is  over  Z  and  {/„(•,  -)}^°=i  satisfy  next  a)  and  b), 

a)  Yn  —  Xn  —  Zn  is  a  Markov  chain  for  n  —  1,2,  -- ■,  hence, 
PxnYnzn (x, y, z)  =  Px"V" (x, y)Pzn\xn (zlx)  holds  for  all 
n  =  1,2,  -  -  and  for  all  x  G  A"*,y  G  yn,  and  z  G  Zn . 

b)  there  exists  a  sequence  of  function  {/„(•,  •)}n^=i>  /"  : 
y1  x  zn  -+  Xn  such  that  X  =  {fn(Yn,  Zn)}“  lt  and 
D(X,  X)  <  D. 


In  order  to  specify  the  code,  we  need  a  function  Fn  :  Xn  — » 
{zi}"”  C  Zn,  where  M'n  <  e”(7^z)+-r)  and  7  >  0.  F„  is 
due  to  an  extended  version[3]  of  Lemma  4.3  of  [4], 

1.  Generation  of  codebook:  Let  M„  —  en7(X;Z|Y)  t  27)^ 
make  Af„  bins.  Randomly  assign  the  F„(x),x  G  Xn  to  one  of 
Mn  bins  using  a  uniform  distribution  over  the  bins. 

2.  Encoding  (fin  :  Xn  — *  Xm„-  Given  a  source  output  x  G  Xn 
from  X,  the  encoder  looks  for  a  Zi  =  Fn(x).  Then,  the  encoder 
sends  the  index  j  G  Im„  of  the  bin  such  that  z,  belongs  one. 

3.  Decoding  ip„  :  x  J2”  — *  Xn.  Decoder  receives  an  output 

j  =  (fin  (x)  from  encoder  and  receives  a  output  y  G  J"  as  the 
side  information  from  Y.  If  he  can  find  a  unique  z j  which 
belongs  bin  of  the  index  j  and  satisfies  (zi,y)  G  { (y,  z)  G 

ynxZn  I  £  log  <  7(X;  Z|  Y)+7} ,  then  he  has 

(j,  y)  =  /n(y,  Zi)  by  using  /„(•,  ■)  defined  in  property  b).  If 
he  does  not  find  such  a  unique  Zi,  then  he  sets  ipn(j, y)  =  x 
where  x  is  an  arbitrary  sequence  in  Xn. 

The  converse  part  is  due  to  a  modified  version  of  Lemma 
2.4  of  [2]. 
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Abstract  —  In  this  work,  Csiszar’s  fixed-length 
source  coding  /3-cutoff  rates  are  investigated  for  the 
class  of  arbitrary  discrete  sources  with  memory.  It 
is  demonstrated  that  the  limsup  and  liminf  Renyi 
entropy  rates  provide  the  formulas  for  the  forward 
and  reverse  /3-cutoff  rates,  respectively.  Consequently, 
new  fixed-length  source  coding  operational  character¬ 
izations  for  the  Renyi  entropy  rates  are  established. 

I.  Introduction 

In  [2],  Csiszar  establishes  the  concept  of  generalized  fixed- 
length  source  coding  cutoff  rates  (forward  and  reverse)  for 
discrete  memoryless  sources.  More  specifically,  given  /3  >  0, 
he  defines  the  forward  /3-cutoff  rate  for  a  source  {X;}^,  as 
the  number  Ro  that  provides  the  best  possible  lower  bound 
in  the  form  0(R  -  Ro)  to  the  source  reliability  function.  This 
definition  implies  that  the  source  error  probability  is  guaran¬ 
teed  to  exponentially  decay  with  a  linear  exponent  of  specified 
slope  /3  for  R  >  Ro-  He  also  provides  a  similar  definition  for 
the  reverse  /3-cutoff  rate  (where  /3  >  0)  with  respect  to  the 
source  unreliability  function  (the  exponent  of  the  vanishing 
probability  of  correct  decoding).  He  then  demonstrates  that 
the  forward  and  reverse  /3-cutoff  rates  are  respectively  given 
by  Hi/(i+0)(Xi)  and  H1/(1_0)(Xi),  where  Ha(X i)  denotes 
the  Renyi  entropy  of  order  a. 

In  this  work,  we  extend  Csiszar’s  results  [2]  by  investi¬ 
gating  the  /3-cutoff  rate  for  arbitrary  (not  necessarily,  sta¬ 
tionary,  ergodic,  etc.)  discrete-time  finite-alphabet  sources 
X  =  {X"  =  (X1(n),...,x'n))}~=1  [3].  We  demonstrate  that 
the  limsup  and  liminf  Renyi  entropy  rates  provide  the  expres¬ 
sions  for  the  forward  and  reverse  /3-cutoff  rates,  respectively. 
These  results  also  provide  simple,  and  in  certain  cases,  com¬ 
putable  lower  bounds  to  the  source  reliability  and  unreliability 
functions. 

II.  Main  results 

Definition  1  An  (n,  M)  fixed-length  source  code  for  X"  is 
a  collection  of  M  n-tuples  r€.n  =  {cj“, . . . ,  c7^}.  The  error 

probability  of  the  code  is  Pe(rGn)  =  Pxn  [Xn  £  . 

Definition  2  Fix  e  >  0.  R  >  0  is  e-achievable  for  a  source 
X,  if  there  exists  a  sequence  of  (n,  Mn)  fixed-length  source 
code  rQn  such  that 

lim  sup  —  log  Mn  <  R  and  lim  inf  -  -  log  Pe  {rGn)  >  e. 
n->oo  n  n-*oc  n 

Fix  0  >  0.  The  forward  0-cutoff  rate  for  X,  denoted  by 
Rq^(/3|X),  is  defined  as  the  smallest  Ro  >  0  such  that  ev¬ 
ery  R  >  0  is  0{R  -  Ro)- achievable. 

This  work  was  supported  in  part  by  Queen’s  University,  NSERC 
of  Canada  and  NSC  of  Taiwan,  R.O.C. 


Theorem  1  (Forward  /3-cutoff  rate  [1])  Fix  0  >  0.  For 
an  arbitrary  source  X, 

Rif)(l 3|X)  =  limsup-R1/(1+/))(X"), 

71-400  n 

where 

Ha(Xn)  =  — —  log  V  P%n(xn) 

1  —  a 

a:"e*n 

is  the  (n-dimensional)  Renyi  entropy  of  order  a. 

Definition  3  Fix  e  >  0.  R  >  0  is  reverse  e-achievable  for 
a  source  X,  if  there  exists  a  sequence  of  (n,  Mn)  fixed-length 
source  code  ^  such  that 

lim  sup  —  log  Mn  <  R  and  lim  inf  —  —  log(l  —  Pe('€n))  <  e. 

Tl-fOO  U  71-400  Tl 

Fix  0  >  0.  The  reverse  0-cutoff  rate  for  X,  denoted  by 
i?or^(/3|X),  is  defined  as  the  largest  Ro  such  that  every  R  >  0 
is  reverse  0(R  —  Ro) -achievable. 

Theorem  2  (Reverse  /3-cutoff  rate  [1])  Fix  0  <  0  <  1. 
For  any  source  X, 

R«(/3|X)=  liminf  ±-Him_0)(Xn). 

n—y  oo  Tl 

III.  Conclusions 

In  closing,  we  would  like  to  make  the  following  observations. 

•  It  is  important  to  point  out  that  if  the  source  X  is  a  time- 
invariant  Markov  source  of  arbitrary  order,  then  its  Renyi  en¬ 
tropy  rate  exists  and  can  be  computed  [4].  Thus  in  this  case, 
the  /3-cutoff  rates  for  this  source  can  be  obtained. 

•  A  convex  lower  bound  can  be  obtained  on  the  source  relia¬ 
bility  function.  It  consists  of  the  supremum  of  all  the  support 
lines  with  slope  0  which  pass  through  the  point  (R^(0 |X),  0), 
given  by  sup g>o[0(R  —  -Rq^GSIX))]  for  every  R  >  0.  We  can 
thus  conclude  that  for  the  class  of  sources  X  for  which  the 
Renyi  entropy  rate  can  be  calculated  (e.g.,  the  class  of  Markov 
sources),  a  computable  lower  bound  to  the  source  reliability 
function  can  also  be  obtained.  A  similar  remark  applies  for 
the  source  unreliability  function. 
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Abstract  —  In  this  paper  we  analyze  Shannon’s  ci¬ 
pher  system  with  the  general  source  [1].  We  com¬ 
pletely  determine  the  achievable  rate  region  of  a  cryp¬ 
togram  and  a  key  required  for  encryption  of  an  output 
of  the  sources  with  the  one-point  spectrum.  An  inner 
and  an  outer  bounds  are  given  for  the  other  sources. 


I.  Introduction 

This  paper  attempts  to  analyze  Shannon’s  cipher  system  [2] 
from  a  viewpoint  of  the  information-spectrum  method  origi¬ 
nating  from  [1].  Figure  1  shows  Shannon’s  cipher  system.  For 
each  n  >  1  let  Sn  be  a  random  variable  from  a  source  taking 
values  in  Sn.  The  cardinality  of  alphabet  S  is  either  finite  or 
countably  infinite.  Let  En  be  the  uniformly  distributed  ran¬ 
dom  variable  on  a  finite  alphabet  £n  from  a  key  generator. 
The  key  En  is  transmitted  to  both  an  encoder  and  a  decoder 
through  a  secret  channel  perfectly  protected  against  wiretap¬ 
pers.  The  encoder  encrypts  Sn  into  a  cryptogram  Wn  €  W„ 
under  En  as  Wn  =  fn{Sn ,En),  where  fn  is  a  deterministic 
function.  The  encoder  transmits  W„  to  a  decoder  through 
a  public  channel  in  the  presence  of  the  wiretappers.  There¬ 
fore,  Wn  is  required  not  to  reveal  information  on  Sn .  The 
decoder  decrypts  Wn  under  En  and  reproduces  5"  with  small 
decoding  error  probability  by  using  a  deterministic  function 
gn  :  Wn  X  £ n  -4  S™ . 

In  this  paper  we  consider  the  case  that  the  decoding  error 
probability  tends  to  zero  a s  n  —t  oo.  We  characterize  achiev¬ 
able  rates  required  for  transmission  of  Wn  and  En  subject  to 
a  new  criterion  on  secrecy  of  the  encryption. 

II.  Coding  Theorems  for  Sources  satisfying 

H(S)  =  H(S) 

Let  S  —  be  the  general  source  [1],  Here,  the 

general  source  means  an  infinite  sequence  of  random  variables 
not  required  to  satisfy  the  consistency  condition.  First  we  con¬ 
sider  general  sources  with  one-point  spectrum,  i.e.,  the  general 
sources  satisfying  H_( S)  =  H( S)  d=  H,  where  H_( S)  and  if(S) 
are  the  entropy  spectrum-inf  and  the  entropy  spectrum-sup 
defined  in  [1],  Let  E  =  and  W  =  {W„}~=1. 

For  a  given  constant  h  >  0,  we  define  the  h-achicvable 
region  for  ( Rw,Re )  as  follows: 

Definition  1  Let  h  >  0  be  a  given  constant.  A  pair  of  rates 
( Rw,Re )  is  called  h-achievable  if  there  exists  a  sequence  of 
pairs  of  an  encoder  and  a  decoder  {(fn,gn)}T=i  satisfying 

lim  sup  —  log2  |  W„  |  <  Rw ,  (1) 

n-4oo  ^ 

lim  sup  -  log 2  | £n  |  <  Re,  (2) 

n— ho o  n 

lim  Pr {gn(fn(Sn,En),En)  Sn}  =  0,  (3) 

71— hOO 

ff(S|W)  >  h,  (4) 


Fig.  1  Block  diagram  of  Shannon’s  cipher  system 

where  ff(S|W)  denotes  the  liminf  in  probability  of 
n  ^°S2  7>sn|W'n(S'>|wn)  an<*  jPs"|w„(5”1|W„)  denotes  the  con¬ 
ditional  probability  of  Sn  given  Wn  ■ 

Intuitively,  Rw  and  Re  mean  the  rates  of  the  public  chan¬ 
nel  and  the  secret  channel  for  sufficiently  large  n.  Note  that 
(4)  means  that  with  probability  close  to  one  a  pair  of  a 
source  output  sn  €  Sn  and  a  cryptogram  wn  €  Wn  satis¬ 
fies  F>s"|w„('Sn|u>n)  <  if  n  is  sufficiently  large.  If  (4) 

is  satisfied,  a  criterion  proposed  in  [3]  is  always  satisfied. 

Definition  2  (Achievable  Rate  Region) 

R  —  {(Rw ,  Re)  ■  ( Rw,Re )  is  achievable}.  (5) 
Then,  we  have  the  following  theorem  on  71. 

Theorem  1  For  an  arbitrary  h  £  (0,  if), 

71  =  TV , 

where  R *  —  {( Rw,Re )  :  Rw  >  if  and  Re  >  h }. 

III.  Coding  Theorem  for  General  Sources^ 

For  encryption  of  general  sources  satisfying  if( S)  <  if(S) 
we  assume  that  uniformly  distributed  random  variables  U  = 
{Un}^Li,  U n  €  fin,  are  available  only  to  the  encoder.  We 
define  the  achievable  region  R  for  the  triplet  of  Rw,Re  and 
Ru  similarly  to  Definitions  1-2,  where  Ru  specifies  \ltn  \  by 
lim  supn_>00  L  log2  |ii„|  <  Ry.  We  have  the  following  bounds: 

Theorem  2  For  an  arbitrary  h  €  (0,  if(S)), 

77*n  CRC7l'out, 

where  R*n  =  {(Rw ,  Ru,  Re)  ■  Rw  >  H(S),Ru  >  if(S) — 
ff_(S)and  Re  >  h }  and  R'out  =  {(Rw  ,Ru,  Re)  ■  Rw  > 
if  (S),  Ru  >  0  and  Re  >  h }. 
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Abstract  —  We  theoretically  analyze  the  statistical  behavior  of 
prediction  errors  generated  by  our  previously  proposed  long  range 
prediction  algorithm,  and  investigate  adaptive  modulation  design 
using  predicted  channel  state  information  (CSl).  Both  numerical  and 
simulation  results  show  that  accurate  prediction  of  the  fading  channel 
far  ahead  makes  adaptive  transmission  feasible  for  rapidly  time- 
varying  mobile  radio  channels. 

1.  Introduction 

Adaptive  modulation  methods  depend  on  accurate  channel  state 
information  (CSI)  that  can  be  estimated  at  the  receiver  and  sent  to  the 
transmitter  via  a  feedback  channel.  This  information  would  allow  the 
transmitter  to  choose  the  appropriate  transmitted  signal.  The  feedback 
delay  and  overhead,  processing  delay  and  practical  constraints  on 
modulation  switching  rates  have  to  be  taken  into  account  in  the 
performance  analysis  of  adaptive  modulation  methods.  For  very  slowly 
fading  channels  (pedestrian  or  low  vehicle  speeds),  outdated  CSI  is 
sufficient  for  reliable  adaptive  system  design.  However,  for  rapidly  time 
variant  fading  that  corresponds  to  realistic  mobile  speeds,  even  small 
delay  will  cause  significant  degradation  of  performance  since  channel 
variation  due  to  large  Doppler  shifts  usually  results  in  a  different 
channel  at  the  time  of  transmission  than  at  the  time  of  channel 
estimation  [1,  2].  To  realize  the  potential  of  adaptive  transmission 
methods,  these  channel  variations  have  to  be  reliably  predicted  at  least 
several  milliseconds  ahead. 

Recently,  we  have  investigated  a  novel  adaptive  long-range  fading 
channel  prediction  algorithm  in  [3].  This  algorithm  characterizes  the 
fading  channel  using  an  autoregressive  (AR)  model  and  computes  the 
Minimum  Mean  Squared  Error  (MMSE)  estimate  of  a  future  fading 
coefficient  sample  based  on  a  number  of  past  observations.  The 
superior  performance  of  this  algorithm  relative  to  conventional  methods 
is  due  to  its  low  sampling  rate  [3].  Given  a  fixed  model  order ,  the  lower 
sampling  rate  results  in  longer  memory  span,  permitting  prediction 
further  into  the  future.  The  prediction  method  is  enhanced  by  an 
adaptive  tracking  method  [3]  that  increases  accuracy,  reduces  the  effect 
of  noise  and  maintains  the  robustness  of  long-range  prediction  as  the 
physical  channel  parameters  vary. 

In  this  paper,  we  extend  the  application  of  long  range  channel 
prediction  to  adaptive  modulation.  First,  we  theoretically  analyze  the 
statistical  behavior  of  prediction  errors  generated  by  our  long  range 
prediction  algorithm,  and  consider  adaptive  modulation  design  based  on 
this  prediction  error  model  using  predicted  CSI.  Then,  we  evaluate  the 
performance  of  adaptive  modulation  for  flat  Rayleigh  fading  channels. 
The  extension  of  this  method  to  our  novel  realistic  nou-stationary  fading 
model  and  measured  data  are  discussed  in  [3,4]  and  references  therein. 

2.  Results 

Consider  the  linear  MMSE  prediction  of  the  future  channel  sample 
c„  based  on  p  previous  samples  cn.,.,.cn  as  [3]: 


(1) 


where  the  coefficients  dj  are  determined  by  the  orthogonality  principle. 
We  assume  that  channel  samples  c„  are  modeled  as  zero-mean  complex 
Gaussian  random  variables,  i.e.,  the  channel  is  Rayleigh  fading.  Thus, 
the  amplitude  a  -  lcnl  and  its  predicted  value  a  =  lc„l  have  a  bivariate 
Rayleigh  distribution.  We  define  the  prediction  error  (3  as  the  ratio  of 
the  actual  fading  gain  a  and  the  predicted  fading  gain  a,  i.e.,  p  =  a/a. 
Then  the  probability  density  function  (pdf)  of  p  can  be  derived  as: 


2x(xx2+  X.)(l-p) 

Pp(x)=— r 


where  the  correlation  coefficient  p  = 


((XX2+X)2-4px2)‘-5 

Cov(a2,a2) 


VVar(a2)Var(a2) 


(2) 


,  0  <  p  <  1, 


Q=E{a2  },£2  =  E{&2  },and  X^yfcHo.. 

We  consider  the  fixed  power  and  modulation  level-controlled 
scheme  using  Square  Multilevel  Quadrature  Amplitude  Modulation 
(MQAM)  signal  constellation  for  the  target  Bit  Error  Rate  (BER^)=  10 3. 
We  restrict  ourselves  to  MQAM  constellations  Of  sizes  M  =  0,  2,  4,  16, 
64.  Given  fixed  transmitter  power  E,  (or  the  average  Signal-to-Noise 
Ratio  (SNR)  level  7  =  E,/N0),  to  maintain  a  target  BER,  we  need  to 
adjust  the  modulation  size  M  according  to  the  instantaneous  channel 
gain  a(t).  In  other  words,  the  adaptive  modulation  scheme  can  be 
specified  by  the  threshold  values  cq,  i  =  1,  ...,  4,  defined  as:  when  a(t)  > 
Oj,  MrQAM  is  employed,  where  M,  =  2,  M,  =  22(‘  l),  i  >  1.  When  perfect 
CSI  a(t)  is  available,  these  thresholds  can  be  directly  calculated  from  the 
BER  bound  of  MQAM  for  an  Additive  White  Gaussian  Noise  (AWGN) 
channel  [1]: 


BERM  <  0.2  exp(-1.5Y(t)/(M-l))  for  M>2,  and 
BER2  =  Q(V2y).  (3) 

where  y(t)=  a2(t)7  is  the  instantaneous  received  SNR.  However,  when 
the  predicted  CSI  dt (t)  is  used,  the  current  channel  condition  is 
characterized  by  the  distribution  of  p(ala)  which  can  be  calculated  as: 


1  x 

(4) 

Then,  the  BER  bound  for  predicted  CSI  a,  say  BER*M,  can  be  obtained 
by  evaluating  the  expectation  of  BERM  over  P  using  pp(x)  in  (2)  as: 


BER*m  =  J  BERM(YX2a2)pp(x)dx 
0 


(5) 


This  indicates  that  we  need  to  use  BER*M  rather  than  BERM  to  calculate 
thresholds  when  only  the  predicted  CSI  is  available.  In  our  study,  we 
found  that  when  our  long  range  prediction  is  used  for  the  realistic 
prediction  range,  there  is  small  difference  between  the  thresholds 
calculated  using  perfect  CSI  and  predicted  CSI  [4],  This  demonstrates 
that  the  long  range  prediction  preserves  the  ideal  bit  rate  while 
maintaining  the  target  BER.  However,  from  the  results  in  [2],  we  found 
that  even  very  small  delay  will  cause  great  loss  of  bit  rate  for  fast 
vehicle  speeds  when  the  strongly  robust  signaling  design  rule  is  used 
without  long  range  prediction.  Thus,  accurate  long-range  prediction  is 
required  to  achieve  the  bit  rate  gain  of  adaptive  MQAM  for  rapid 
vehicle  speeds  and  realistic  delays. 
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Abstract  —  The  use  of  multidimensional  alphabets 
with  correlated  tones  and  noncoherent  detection  over 
Rayleigh  fading  channels  allows  to  increase  the  typi¬ 
cally  low  spectral  efficiency  of  noncoherent  transmis¬ 
sion  and  to  compensate  for  the  performance  degrada¬ 
tion  due  to  the  high  correlation  between  the  tones. 


I.  Multidimensional  Noncoherent  Detection 

Frequency  shift  keying  (FSK)  is  a  robust  modulation 
scheme  when  noncoherent  detection  is  processed.  Particu¬ 
larly,  any  channel  estimation  becomes  useless  for  Rayleigh 
fading  channels.  Noncoherent  detection  [l]  -a  measure  of 
the  signal  envelope  after  matched  filtering-  of  Q-ary  FSK  is 
usually  made  with  Q  orthogonal  signals  and  a  tone  spacing 
A/o  equal  to  the  inverse  of  the  symbol  period  T.  The  band¬ 
width  can  be  reduced  if  A/o  <  1/T  i.e.  orthogonality  is  no 
longer  satisfied.  The  performance  degradation  due  to  the  use 
of  correlated  tones  can  be  compensated  by  careful  signal  al¬ 
phabet  design  ;  for  example  high  dimensional  constellations. 
We  build  M-dimensional  FSK  alphabets  of  size  N .  All  signals 
Sm  =  (sm, i, . .  . ,  *m, n),  m  =  1, . . . ,  M  are  similar  to  the  ones 
treated  in  [2]  for  the  Gaussian  channel  and  have  equal  energy. 
The  elementary  component  sm,n(t),  derived  from  a  Q-FSK,  is 
given  by  =  \J i/Te]2nrnnAfot  for  -T  / 2  <  t  <  T/2  and 

1  <  n  <  N.  mn  is  the  number  of  the  transmitted  tone  on  the 
nth  component  of  signal  Sm,  mn  £  {!,...,  Q}. 


IE  ML  Performance  Analysis 

The  channel  is  assumed  to  be  frequency-nonselective  and 
slowly  fading.  The  optimal  noncoherent  demodulator,  com¬ 
posed  of  a  bank  of  Q  matched  filters  and  a  signal  envelope 
detector,  carries  out  Q  x  N  values  r9i„.  For  each  signal  5m, 

the  set  {|rmnitl|2}„=i . N  is  a  sufficient  statistic  to  make  a 

decision.  Following  an  approach  similar  to  [3],  we  derive  a 
simplified  structure  of  the  Maximum  A  Posteriori  (MAP)  de¬ 
coder.  A  decision  is  made  in  favor  of  Sm  which  maximises 
N 

Am  =  \  '  |rmn,n|  171=  1,  .  .  .  ,  M  (1) 

n=  1 

The  pairwise  error  probability  P(S,  -A  Sj)  can  be  derived 
from  (1)  by  P(A,  <  A j)  [3].  Finally,  P(S,  -4  Sj)  is  given  by 


E 


i-r 


/  1-Ihn|2  ' 

0.5x(l-|Pn|2)  1 

1 1  -|p„|2r2_ 

nr=i^„(i^!2-!^!2)  j 

(2) 


where  F  is  a  signal-to-noise  ratio  and  pn  is  the  correlation 
between  the  nth  components  of  signals  S,  and  S} .  For  all 
k,  l  £  {1,  . . . ,  N},  we  suppose  that  p*  #  p;. 

HE  Multidimensional  Alphabets  Results 

Two  alphabets  of  dimension  4  are  compared.  Each  compo¬ 
nent  sm,n(t)  is  denoted  by  the  number  of  the  transmitted  tone 


Fig.  1:  4-dimensional  8-FSK  correlated  signals  vs.  BFSK 
signals  with  order  of  diversity  4. 


mn-  Be  is  the  bandwidth  expansion,  defined  as  the  inverse  of 
the  spectral  efficiency.  The  first  alphabet  is  composed  of  two 
orthogonal  BFSK  signals  with  diversity  4  :  Si  =  (1,  1,1,1) 
and  S2  =  (2,2,2,  2).  Its  theoretical  performance  can  be  de¬ 
rived  from  equation  (4.61)  in  [3]  with  N  =  4  and  p„  =  p  =  0 
(orthogonal  tones).  The  second  alphabet  of  size  M  =  8,  de¬ 
signed  in  a  heuristic  manner,  is  based  on  8-FSK  correlated 
signals. 

51  -4  (1,2, 1,2)  S5  -4  (5, 6, 4, 1) 

52  -4  (2,  4, 3,  4)  S6-t( 6, 8, 6,  3) 

53  -4  (3, 1,5,  6)  S7  -A  (7,  5,  8,  7) 


S4  -4  (4, 3,  7,  8)  Ss  -A  (8,  7,  2,  5) 


Notice  that  this  set  has  also  an  order  of  diversity  equal  to  4 
and  that  the  most  correlated  signals  are  Si  and  S7. 

It  can  be  easily  shown  that  P(S.*  -A  Sr)  <  Pe  <  (M  — 
1)P(S4  -4  S7).  The  pairwise  error  probability  values  are  de¬ 
rived  from  equation  (2).  The  BER  is  given  by  Pb  =  Pe/2. 
Two  main  results  are  highlighted  in  figure  1.  First,  simula¬ 
tion  results  validate  both  upper  find  lower  bounds.  Moreover, 
the  correlated  signals  alphabet  exhibits  excellent  results  when 
compared  to  the  classical  BFSK  alphabet,  although  the  spec¬ 
tral  efficiency  is  three  times  larger. 
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Abstract  —  Multipath  propagation  effects  encoun¬ 
tered  in  mobile  wireless  channels  provide  additional 
degrees  of  freedom  that  can  be  exploited  via  appropri¬ 
ate  signaling  and  reception.  In  this  paper,  we  propose 
a  framework  for  spread-spectrum  signaling  and  recep¬ 
tion  that  allows  manipulating  these  inherent  degrees 
of  freedom  for  maximum  bandwidth  efficiency.  We 
present  a  simple  approach  for  transforming  a  multi- 
path  channel  with  a  single  transmit  and  single  receive 
antenna  into  a  virtual  multiple-input  multiple-output 
system  where  space-time  codes  can  be  directly  ap¬ 
plied.  Performance  analysis  suggests  that  simple  sig¬ 
naling  schemes  based  on  our  framework  can  yield  sig¬ 
nificant  capacity  gains  over  existing  spread  spectrum 
systems. 

I.  Summary 

Time-varying  multipath  propagation  effects  encountered  in 
mobile  wireless  channels  provide  additional  degrees  of  freedom 
that  can  be  exploited  for  bandwidth-efficient  communication 
via  appropriate  signaling  and  reception.  In  spread-spectrum 
code  division  multiple  access  (CDMA)  systems,  signals  of  suf¬ 
ficiently  high  bandwidth  and  long  time  durations  can  be  used 
with  the  RAKE  receiver  to  exploit  multipath-Doppler  diver¬ 
sity  [1],  In  essence,  uncorrelated  time-varying  multipath  scat¬ 
tering  provides  degrees  of  freedom  (DoFs)  that  can  be  ex¬ 
ploited  to  enhance  performance.  However,  conventional  sys¬ 
tems  exploit  all  these  DoFs  for  receiver  diversity  and  provide 
diminishing  returns  as  the  DoFs  increase. 

Recent  studies  on  antenna  arrays  have  shown  that  the 
capacity  of  multiple-input  multiple-output  (MIMO)  systems 
far  exceeds  that  of  single-input  single-output  (SISO),  single¬ 
input  multiple-output  (SIMO)  and  multiple-input  single¬ 
output  (MISO)  systems  in  a  dense  scattering  environment. 
Motivated  by  these  results,  we  propose  a  new  transceiver 
structure  for  the  multipath  fading  channel  that  allows  ma¬ 
nipulating  the  inherent  degrees  of  freedom  for  bandwidth  effi¬ 
ciency.  In  effect,  we  present  a  simple  approach  for  transform¬ 
ing  a  multipath  channel  with  L  degrees  of  freedom  ( L  inde¬ 
pendent  paths)  into  a  virtual  transmit-receive  antenna  array 
system  with  M  transmitters  and  N  receivers,  for  any  M  and 
N  such  that  L  —  MN. 

We  consider  spread-spectrum  signaling  over  a  frequency- 
selective,  slowly  fading  channel  with  multipath  spread  Tm. 
The  transmitted  signal  is  of  duration  T  and  bandwidth 
B.  The  impulse  response  of  the  channel  is  given  by  h  = 
[hi,  /12,  where  L  =  TmB  is  the  number  of  degrees  of 

freedom  available  in  the  system.  Since  the  dimensionality  of 
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the  signal  space  is  K  kTB  [2],  we  can  obtain  a  matrix  formu¬ 
lation  of  the  system  by  projecting  onto  K  basis  waveforms  that 
capture  the  sufficient  statistics.  The  system  can  be  viewed  as 
a  A'-input  A'-output  system  over  the  signal  space  and  repre¬ 
sented  in  the  form  y  =  Hx  +  w,  where  x  is  the  transmitted 
signal  vector,  y  is  the  received  signal  vector,  H  is  the  channel 
matrix  and  w  is  AWGN.  When  Nyquist  sampling  is  done,  the 
basis  functions  are  sine  pulses  and  the  channel  matrix  is  block 
toeplitz.  In  our  direct  sequence  CDMA  system,  we  choose 
the  basis  functions  to  be  circularly-shifted  versions  of  an  arbi¬ 
trary  signature  waveform  corresponding  to  a  spreading  code 
of  length  K.  In  this  case,  the  components  of  x  are  modulated 
onto  circularly-shifted  versions  of  a  signature  waveform  and 
transmitted.  This  choice  of  basis  leads  to  a  circulant  H.  For 
this  A'-input  A'-output  system,  we  study  interesting  special 
cases  where  the  transmitter  and  receiver  use  only  a  subset  of 
the  K  available  dimensions. 

In  our  framework,  the  conventional  RAKE  receiver  cor¬ 
responds  to  transmitting  a  single  signature  waveform  and 
can  be  viewed  as  a  1-input  A-output  system.  When 
x  =  [xi, X2,  ■  ■  . , im, 0, . . . , 0]  and  the  receiver  looks  only  at 
[yMi  V2M,  ■  ■  ■  >  Vnm],  where  L  =  MN,  the  multipath  channel 
can  be  viewed  as  a  virtual  M-input  A-output  system.  The 
N  x  M  matrix  H  contains  the  L  channel  coefficients  as  its  el¬ 
ements.  In  an  uncorrelated  scattering  Rayleigh  fading  model, 
the  elements  of  H  are  uncorrelated.  The  system  is  equivalent 
to  an  antenna  array  system  with  M  transmitters,  N  receivers 
and  independent  coupling  between  antenna  pairs.  Existing 
space-time  codes  such  as  those  in  [3]  can  be  directly  applied 
to  this  system. 

We  consider  outage  capacity  as  the  performance  measure. 
Transforming  the  (1,  L)  system  into  a  (M,  N)  system  provides 
clear  capacity  gains  due  to  an  increase  in  the  number  of  par¬ 
allel  channels.  For  example,  at  the  1%,  5%  and  10%  outage 
levels  and  for  high  SNR  (larger  than  20  dB),  the  improvement 
in  performance  of  (2,  2)  over  (1,  4)  is  almost  5  dB,  and  the  im¬ 
provement  of  (2,3)  over  (1,6)  is  more  than  7  dB.  The  (M,  N ) 
systems  we  propose  also  have  a  low  complexity  transceiver 
structure  and  existing  space-time  codes  can  be  directly  em¬ 
ployed.  These  results  suggest  that  simple  modifications  based 
on  our  framework  can  significantly  improve  the  capacity  of 
existing  single- antenna  spread  spectrum  systems  that  employ 
the  RAKE  receiver. 
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Abstract  —  We  examine  the  problem  of  designing  complex, 
equal  energy,  signal  constellations  for  the  noncoherent  additive 
white  Gaussian  noise  communication  channel.  We  derive  an  asy¬ 
mptotic  performance  criterion  that  may  be  used  as  a  constraint 
in  building  correlated  signals  set  for  use  with  the  maximum  like¬ 
lihood  noncoherent  detector.  We  present  an  iterative  update  de¬ 
sign  procedure  for  obtaining  bandwidth-efficient  signal  sets  un¬ 
der  a  constraint  on  the  dimension  of  the  signal  space. 

I.  Introduction 

On  noncoherent  communication  channels,  orthogonal  multi-pulse 
modulation  (OMM)  is  typically  employed  wherein  the  user  transmits 
one  of  M  orthogonal  signals  during  each  baud  interval  [1],  The  most 
common  implementation  of  OMM  is  frequency  shift-keying  (FSK). 
The  chief  advantage  of  OMM  is  the  simple  implementation  (envelope 
detection)  of  the  receiver.  The  major  drawback  to  OMM  is  its  poor 
spectral  efficiency.  Non-orthogonal  multi-pulse  modulation  (NMM) 
combats  this  drawback  by  allowing  correlation  among  the  signals. 

II.  Problem  Statement 

In  NMM,  an  M-ary  symbol  is  sent  by  transmitting  one  of  M  equal- 
energy,  complex-valued  signals  that  lie  in  an  iV-dimensional  signal 
space.  The  minimum  bandwidth  B  nCedetf.to  generate  such  signals 
is  N/T  Hz,  where  T  is  the  baud  interval.  The  discrete-time  model 
for  NMM  signaling  over  the  additive  white  Gaussian  noise  (AWGN) 
channel  is  hence 


y=  VEej*mhm  +  n,  (1) 

when  m  €  {1,  •  •  ■  ,  M]  is  the  transmitted  symbol  and  the  corre¬ 
sponding  signal  h m  is  a  unit-norm  complex  vector  lying  in  CJVxl ; 
E  is  the  received  energy  for  each  symbol;  <pm  is  an  unknown  phase, 
modeled  as  a  uniform  random  variable  on  [0, 2ir)\  and  n  is  a  zero 
mean  complex  normal  random  vector  with  correlation  U[nn*]  = 
cr2I,  where  *  denotes  complex-conjujate  transpose. 

Assuming  equi-probable  symbols,  the  optimum  detector  selects 
the  signal  that  maximizes  the  magnitude  of  its  inner  product  with 
the  received  signal: 

fh  =  argmax|y*hm|.  (2) 

TO 

This  detector  has  a  probability  of  error  which  is  asymptotically  a 
monotonic  function  of  the  largest  magnitude  of  the  cross-correlation 
coefficient  p  —  ma xm^;  |hm*h;|.  Define  the  signal  correlation  ma¬ 
trix  R  with  Rmi  =  hm*h;. 

We  formulate  the  problem  of  designing  a  bandwidth-efficient  mod¬ 
ulation  scheme  purely  in  terms  of  R  as  follows: 

Problem  Statement:  Given  N  g  N  and  0  <  p  <  1,  find  the 
largest  M  €  N  for  which  the  corresponding  R  €  CMxM  satis¬ 
fies  Cl:  diag(R)  =  I,  C2:  |Ri,3-|  <  p  for  i  ±  j,  C3:  R  >  0, 
C4:  rank(R)  <  N.  The  noncoherent  signal  set  is  then  formed  (non- 
uniquely)  from  the  eigen-decomposition  of  R,  R  =  UAU*,  via 

h  =  a1/2u*. 


III.  Successive  Updates  of  the  Correlation  Matrix 

We  consider  a  successive  update  procedure  whereby  a  matrix  R* 
satisfying  C1-C4  is  updated  with  a  vector  x  via 


R-fc+i  — 


1  x* 
x  R  k 


(3) 


with  Rfc+i  also  satisfying  C1-C4.  It  turns  out  that  we  can  guar¬ 
antee  that  Rfc+i  is  positive  semidefinite  if  x*R^”  x  <  1,  and  that 
the  rank  of  R^+i  is  equal  to  that  of  R&  when  this  condition  is  met 
with  equality.  In  our  designs,  we  started  with  a  two-dimensional  Ri, 
and  successively  added  signals  until  the  constraint  could  not  be  met 
with  equality.  At  this  point,  the  rank  of  the  matrix  was  allowed  to 
grow  by  one  and  the  process  repeated.  At  each  iteration,  we  maxi¬ 
mized  the  norm,  ||x||2,  under  the  constraint  that  max  | Xfc  j  <  p.  This 
is  a  nonlinear  optimization  problem  and  was  solved  using  a  mod¬ 
ified  Fletcher-Powell  optimization  algorithm  employed  through  the 
FSQP[2]  optimization  package. 


IV.  Results 

In  Figure  1,  we  plot  the  spectral  efficiency  (log2  M/N)  of  our  de¬ 
signs  versus  the  SNR-per-bit  required  to  achieve  a  probability  of  bit 
error  of  10-5.  For  the  NMM  designs,  we  held  the  dimensionality, 
N,  fixed  and  varied  the  maximum  cross-correlation,  p.  For  compari¬ 
son,  we  also  plot  the  spectral  efficiencies  of  coherent  PAM  and  QAM 
modulation  as  well  as  the  capacity  curve  for  the  coherent  channel.  We 
also  plot  the  spectral  efficiency  of  one-sided  PAM  [3,  problem  4.16], 
a  scheme  in  which  a  fixed  waveform  has  its  energy  varied  to  transmit 
information.  These  results  show  that  we  can  map  out  new  portions  of 
the  cnergy/spectral  efficiency  plane  through  our  signal  designs,  and 
that  NMM  can  be  made  significantly  more  bandwidth-efficient  than 
OMM. 


Fig.  1.  Energy  versus  spectral  efficiency  of  the  noncoherent  designs. 
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By  definition,  a  spherical  t— design  in  N — dimensional  Eu¬ 
clidean  space  Rn  is  any  nonempty  finite  set  X  C  Sjv_  1  which 
for  any  polynomial  /  of  degree  at  most  t  satisfies 


1 

\Sn-i  | 


/(x)dp(x)  =  -i-  /(x) 

x£A' 


where  Sjv-i  ■=  {(xi , . . .  ,  zjv)  G  Rn;x5  +  . . .  +  x2N  =  1}  is  the 
unit  sphere  in  Rn,/j(x)  is  the  standard  Euclidean  measure  on 
Sjv-i  (i.e.  p  is  invariant  under  the  orthogonal  group  O(N)) 
and  |Sw_i|  :=  Js  ^  dp(x)  is  the  surface  area  of  S yv-i.  For 
the  basic  properties  of  t— designs,  we  refer  the  reader  to  the 
papers  [1-3]. 

We  denote  by  Hom(fc)  the  space  of  all  homogeneous  N- 
variate  polynomials  of  degree  k  over  R  and  by  Harm(fc)  the 
space  of  homogeneous  7V-variate  harmonic  polynomials  of  de¬ 
gree  k,  i.e.  the  space  of  homogeneous  polynomials  y  =  y(x) 
satisfying  the  Laplace  equation  =  0. 

OX  J  Ox 

Let  G  be  a  finite  group  of  orthogonal  matrices,  and  let  a  be 
a  point  on  Sjv_i.  All  designs  of  the  form  X  :=  G a  :=  {ya|y  € 
G }  (X ,  i.e,  X  is  an  orbit  of  an  initial  point  a  constitute  a 
natural  class  of  designs.  In  [2]  it  was  proved  that  the  orbit 
G a  is  a  t— design  for  any  a  G  Sn-i  if  and  only  if  H(t)  =: 
Harm(l)  +  . . .  +  Harm(t)  there  is  no  G— invariant  harmonic 
polynomial.  Moreover,  in  the  cited  paper  it  was  shown  that  if 
H(t)  contains  some  G— invariant  harmonic  polynomials,  then 
they  can  be  ’’killed”  by  choosing  their  common  root  as  initial 
point  a.  In  this  case,  the  orbit  G a  is  a  t— design. 

In  the  present  paper  we  state  the  results  discussed  above 
in  a  somewhat  more  general  and  convenient  form. 

As  an  example,  we  consider  the  following  well-known  re¬ 
sults.  The  orbit  ,0a  of  the  Conway  group  .0  of  all  orthogonal 
transformations  that  fix  the  Leech  lattice  is  an  11-design  for 
any  initial  vector  a,  because  the  first  .0— invariant  polyno¬ 
mial  with  zero  mean  has  degree  12.  If  we  take  the  vector 
e  =  32-1/,2(— 3,  l23)  G  S23  (see  [5],  Chapter  4,  §11)  as  a,  we 
obtain  an  11-design  consisting  of  196560  elements.  Observe 
that  2e  is  one  of  the  vectors  of  minimal  length  in  the  Leech 
lattice.  The  total  number  of  such  vectors  is  196560  and  the 
group  .0  acts  transitively  on  the  set  of  these  vectors  [5]. 

The  main  result  of  the  present  paper  is  an  explicit  construc¬ 
tion  of  an  infinite  family  of  11-designs  in  the  2n  — dimensional 
Euclidean  space  on  the  top  of  the  groups  $n,2  and  En,2.  n  — 
1,  2, . . .  ,  of  orthogonal  (2”  x  2")— matrices;  these  groups  were 
introduced  in  [6],  The  same  construction  is  proposed  also  for 
9-designs.  The  group  <f>n,2  is  a  subgroup  of  index  2  of  the 
group  E„,2. 

For  n  —  2,  the  group  2  is  of  order  1152  and  is  generated 
by  the  16  matrices  diag(±l,  ±1,  ±1,  ±1),  the  Hadamard  ma- 

trix  1  Jj  _j  ),  and  the  24  permutation  matrices 

V  1  —  1  —  1  1  / 
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Pn  corresponding  to  the  affined  mapping  x  -o  Qx  +  a  of  the 
space  F2  into  itself  (here  Q  G  Mn{ F2)  is  a  nonsingular  matrix 
and  cv  G  F|). 

We  prove  that  the  space  of  <f>n  2 -invariant  harmonic  poly¬ 
nomials  /(x)  of  degree  at  most  9  with  zero  mean  is  one¬ 
dimensional  and  possesses  a  generator  A*n*(x)  of  degree  8. 
The  space  of  En, 2  —invariant  harmonic  polynomials  /(x)  of 
degree  at  most  11  with  zero  mean  is  also  one-dimensional  and 
has  the  same  generator  A^(x).  Therefore,  for  any  root  a  of 
the  polynomial  h^n\x)  the  orbit  <f>n,2a  is  a  9-design  and  the 
orbit  En,2a  is  an  11-design. 

Note  that  A(2)(x)  =  ELo1*,  +  7 EQ,^Qj  + 

168  nil  £q,  —  7/10(E?=i  where  the  variables  are  la¬ 

beled  by  the  elements  of  the  two-dimensional  space  F2  = 
{o  1 , 02,03, 04}  over  the  field  F2. 

The  vector  c(ro),  where  c(x')  =  ^^(0, 1,1,1)  + 

cos  x(l,  0, 0,  0)  and  xo  is  a  root  of  the  equation  A^(c(x))  = 
cos8  x  +  ^  cos4  x  sin4  x  +  ^  cos2  x  sin6  x  +  |  sin8  x  —  =  0, 

is  one  of  the  roots  of  the  polynomial  A*2*(x).  The  orbit  codes 
<f)rj,2c(i:o)  and  E„, 20(2:0)  contain  96  and  192  points,  respec¬ 
tively,  and  are  9-  and  11-designs  in  4-dimensional  Euclidean 
space. 

Similar  methods  can  be  used  to  construct  9-  and  11-designs 
on  sphere  in  the  8-dimensional  Euclidean  space.  The  resulting 
designs  consist  of  15360  and  30720  points,  respectively. 
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Abstract  —  In  this  work  we  investigate  the  design 
of  constellation  mappings  for  the  transmission  of  non- 
uniform  memoryless  sources  over  AWGN  channels  via 
M- ary  modulation  schemes.  We  show  that  constel¬ 
lation  mappings  which  minimize  the  average  symbol 
energy  and,  given  this,  maximize  the  decoding  prob¬ 
ability  of  the  most  likely  signals,  can  yield  SER  and 
BER  performance  that  is  better  than  Gray  encod¬ 
ing  maps.  We  also  find  that  for  highly  non-uniform 
sources,  16-QAM  can  perform  better  than  2-QAM,  in 
terms  of  both  throughput  and  BER. 

I.  Introduction 

For  equally  likely  signals,  Gray  mapping  in  two-dimensional 
signaling  is  generally  accepted  as  optimal  for  minimizing  bit 
error  rate  (BER),  However,  many  data  sources  generate  non- 
uniformly  distributed  symbols,  often  with  memory  (e.g.  image 
or  speech  signals).  Thus,  they  contain  a  substantial  amount 
of  (natural  or  residual)  redundancy  which,  after  transmission 
over  a  noisy  channel,  can  be  appropriately  exploited  by  a 
maximum-a-posteriori  (MAP)  detector  to  improve  the  over¬ 
all  error  resilience  of  the  communication  system  [1], 

In  this  work  we  propose  criteria  for  constructing  mappings 
from  a  set  of  signals  to  points  of  a  two-dimensional  constella¬ 
tion.  We  show  that  for  non-uniform  sources  Gray  mapping  is 
not  necessarily  optimal  for  minimizing  BER  or  symbol  error 
rate  (SER).  We  illustrate  this  in  the  context  of  an  uncoded 
communication  system  with  QAM  modulated,  non-uniform 
signals  sent  over  an  AWGN  channel,  and  decoded  using  MAP 
decoding.  We  also  illustrate  that,  when  using  MAP  decod¬ 
ing  for  highly  non-uniform  signals,  the  BER  performance  of 
16-QAM  can  be  better  than  that  of  2-QAM,  even  though  16- 
QAM  has  four  times  higher  throughput. 

II.  Constellation  Mappings  for  MAP  Decoding 

We  propose  the  following  criteria  (listed  in  order  of  prior¬ 
ity)  for  constructing  mappings  from  a  set  of  M  non-uniformly 
distributed  symbols  to  the  points  of  a  two-dimensional  con¬ 
stellation:  (i)  minimize  the  average  energy  per  symbol  for 
the  M  given  symbol  probabilities,  and  (ii)  successively  mini¬ 
mize  the  conditional  symbol  decoding  error  probabilities,  go¬ 
ing  from  the  most  likely  to  the  least  likely  symbol.  The  fol¬ 
lowing  determines  the  mapping  which  satisfies  criterion  (i), 
up  to  permutations  within  sets  of  symbols  with  the  same 
energy:  given  M  symbol  probabilities  {pi}fLi  with  energies 
Ei  <  ...  <  Em,  any  permutation  7r  of  {1,2,...,  M}  which 
satisfies  p„w  >  .  .  .>  p„(M)  minimizes  Y,iLi  EiP*(i)- 

Subject  to  criterion  (i),  we  next  consider  criterion  (ii).  Let 
si, . . .  ,sm  denote  the  signals  listed  from  most  likely  to  least 
likely.  We  propose  a  simple  heuristic  for  successively  mini¬ 
mizing  the  conditional  probabilities  P(Symbol  Error|si  sent). 

This  work  was  supported  in  part  by  NSERC  of  Canada. 
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Starting  with  symbol  Si,  and  subject  to  not  violating  criterion 
(i),  choose  neighbours  of  Si  to  be  least  likely  signals  to  max¬ 
imize  the  area  of  the  decoding  region  of  signal  ,si.  Continue 
to  allocate  signals  in  this  way  until  there  are  no  signals  left  to 
allocate. 

III.  Numerical  Results 

We  consider  a  Bernoulli(p)  source  sent  over  an  AWGN  chan¬ 
nel  with  16-QAM  modulation  and  MAP  decoding.  BER  cal¬ 
culations  were  done  using  the  upper  and  lower  bounds  in  [2], 
which  coincide  with  each  other  when  plotted.  Fig.  1  shows 
a  16-QAM  constellation  with  a  mapping  M\.  For  p  >  0.5, 
the  mapping  Mi  minimizes  the  average  symbol  energy  (crite¬ 
rion  (i))  and,  subject  to  this,  for  any  noise  variance  No/2,  the 
mapping  Mi  also  maximizes  the  conditional  probability  that 
symbol  0000  (the  most  likely  symbol)  is  decoded,  given  that 
0000  is  sent.  This  is  due  to  the  fact  that  symbol  0000  has 
the  least  likely  neighbours,  subject  to  criterion  1;  thus  the  de¬ 
cision  region  for  0000  is  maximized.  The  remaining  symbols 
are  placed  in  the  constellation  to  successively  maximize  the 
decoding  regions  of  0001,  0100,  and  0010,  in  that  order. 
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Figure  1:  Mappings  Mi  and  Gray  (in  parentheses). 

Under  the  mapping  Mi,  16-QAM  modulation  with  p  = 
0.9  and  MAP  decoding  performs  better  than  the  usual  Gray 
mapping,  gaining  roughly  1  dB  and  0.75  dB  in  Eh / No  (at  error 
rates  between  10~5  and  10-2)  for  SER  and  BER,  respectively. 
We  also  note  that  16-QAM  with  the  mapping  Mi  achieves 
around  1  dB  gain  over  2-QAM  for  p  =  0.9  and  the  same  BER. 
This  leads  us  to  the  interesting  observation  that  while  the 
conventional  wisdom  for  equally  likely  signals  is  that  there  is 
a  tradeoff  between  throughput  and  BER,  with  non-uniform 
signals  there  need  not  be  such  a  tradeoff.  Indeed,  in  this 
example  16-QAM  achieves  both  four  times  the  throughput  and 
better  BER  performance  than  2-QAM  when  p  =  0.9. 
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Abstract  —  In  this  paper,  we  introduce  a  novel 
scheme  using  a  constellation  shaping  approach  to  re¬ 
duce  the  peak-to-average  ratio  (PAR)  in  orthogonal 
frequency-division  multiplexing  (OFDM)  systems.  In 
the  time  domain,  the  peak  power  bound  traces  out 
a  hypercube  boundary.  We  map  this  square  time- 
domain  boundary  back  to  the  frequency  domain  via 
the  DFT  and  construct  a  method  for  indexing  the 
OFDM  constellation  points.  The  encoding  and  decod¬ 
ing  of  the  constellation  use  generators  and  relations 
from  group  theory.  The  end  result  is  a  coding  scheme 
with  nearly  20  dB  of  PAR  reduction  with  no  reduction 
in  data  rate  or  performance. 


I  Introduction 


In  an  orthogonal  frequency  division  multiplex  (OFDM) 
system,  the  output  time  samples  are  generated  by  the  in¬ 
verse  FFT  of  the  constellation  points.  When  each  channels 
takes  on  a  constellation  point  with  the  maximum  power, 
the  peak  power  is  N  times  of  the  average  power.  Thus, 
the  time  samples  may  occasionally  have  very  high  out¬ 
put  levels,  which  leads  to  the  requirement  of  an  expen¬ 
sive,  highly  linear,  and  power-inefficient  analog  front  end 
(AFE)  and/or  a  clipping  mechanism  to  limit  the  time  sam¬ 
ple  magnitude,  which  leads  to  impulsive  noise  and  perfor¬ 
mance  degradation.  High  PAR  is  arguably  the  greatest 
drawback  of  OFDM. 

Numerous  methods  have  been  proposed  to  reduce  the 
PAR  of  OFDM.  They  tend  to  be  tradeoffs  between  PAR 
and  data  rate  or  distortion.  We  propose  a  method  for 
peak  power  reduction  in  OFDM  systems  based  on  con¬ 
stellation  shaping  [1]  which  can  provide  nearly  20  dB  of 
PAR  reduction  while  maintaining  equivalent  data  rate  and 
performance.  In  addition,  it  can  be  combined  with  other 
existing  methods  to  further  increase  the  PAR  reduction. 

II  Constellation  Shaping 

OFDM  systems  can  operate  either  in  baseband  (as 
in  the  ADSL  standard)  or  in  passband;  we  ex¬ 
amine  only  the  baseband  case  here,  although  the 
method  applies  to  both  variations.  We  restrict  x  = 
[  xo  ■  •  ■  Xjv-i  ]  to  be  real.  This  allows  us  to  define 

X=  [  Reho  •••  ReVk  ImVi  •••  ImVr^_1  j  and 

Ajv  as  columns  of  sin  (27r^)  and  cos  and  we  have 

x  —  A  jvX. 

°This  work  was  supported  by  a  Tellabs  Fellowship  at  the  Uni¬ 
versity  of  Illinois  and  by  the  National  Science  Foundation,  grant  no. 
OCR  99-79381. 


Figure  1:  The  PAR  reduction  versus  the  number  of  chan¬ 
nels  and  constellation  size. 

The  constellation  boundary  is  usually  determined  by 
the  metric  that  we  want  to  optimize.  In  the  problem  of 
PAR  reduction  in  OFDM  systems,  we  use  the  oo-norm, 
11*11  oo , in  ^e  time  domain.  This  metric  traces  out  a  square 
boundary,  defined  by  Hx^  =  ft.  In  the  frequency  do¬ 
main,  we  get:  Hxjl^  =  HAjvXH^  =  /?.  This  is  an  AT-D 
parallelotope  in  the  frequency  domain  defined  by  A^1 .  To 
encode  and  decode  the  constellation  points  inside  this  new 
boundary,  we  use  group  theory  to  compute  the  generators 
for  indexing  these  points. 

We  present  some  results  using  this  algorithm.  Figure  1 
shows  the  total  amount  of  peak-power  reduction  using  this 
constellation  shaping  with  various  numbers  of  channels 
and  constellation  sizes.  We  see  that  reduction  of  over  20 
dB  is  possible  when  the  constellation  size  is  large.  Even 
with  a  typical  constellation  size,  a  peak  power  reduction 
of  over  15  dB  is  easily  realized. 
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Abstract  —  In  this  paper,  a  convenient  set  of  func¬ 
tions  is  identified  whose  span  includes  all  functions  in 
a  tower  of  function  fields  of  Garcia  and  Stichtenoth 
that  have  poles  only  at  the  place  at  infinity.  The  lat¬ 
ter  set  is  of  interest  in  the  construction  of  long  and 
efficient  AG  codes. 

I.  Introduction 

The  Gilbert- Varshamov  (G-V)  bound  is  commonly  used  to 
assess  the  performance  of  long  codes.  While  it  is  known  that 
there  exist  jong  alternant  and  concatenated  codes  that  meet 
the  G-V  bound,  no  explicit  description  of  these  codes  exists. 

Around  1980,  V.  D.  Goppa  used  the  theory  of  algebraic 
curves  to  construct  a  new  family  of  codes,  now  referred  to  as 
algebraic  geometric  (AG)  codes.  Code  performance  depends 
upon  the  ratio  N/g  of  two  curve  parameters,  the  genus  g  and 
the  number  of  (rationed)  points  N.  Good  codes  result  in  cases 
where  the  ratio  N/g  is  large  and  the  Drinfeld-Vladut,  (D-V) 
bound  lim  sup^^  N/g  <  yfq  —  1  places  an  upper  bound  on 
the  ratio. 

In  1982,  Tsfasman,  Vl&dut,  and  Zink  (T-V-Z)  showed  the 
existence  of  curves  whose  N/g  ratio  achieved  the  D-V  bound. 
The  resulting  AG  codes  had  performance  exceeding  that  of 
the  Gilbert- Varshamov  bound  -  a  feat  that  until  then  was 
considered  unattainable. 

However,  the  T-V-Z  result  is  existential  in  nature.  In  1996, 
Garcia  and  Stichtenoth  (G-S)  showed  that  two  families  of 
curves  having  an  explicit  description  as  a  tower  of  function 
fields,  also  achieve  the  D-V  bound.  Identifying  the  genera¬ 
tor  matrices  for  “one-point”  AG  codes  constructed  on  these 
curves  requires  the  determination  of  a  basis  for  the  vector 
spaces  £(rP),  which  comprise  functions  having  poles  only  at  a 
specified  point  P.  The  results  in  this  paper  present  an  impor¬ 
tant  step  towards  determining  this  basis.  A  simply  described 
set  of  functions  whose  span  includes  the  vector  spaces  £(rP) 
is  provided. 

In  [6],  the  authors  provide  generator  matrices  for  codes 
constructed  on  the  first  three  function  fields  in  the  first  G-S 
tower.  Hache  extends  this  result  to  the  fourth  function  field 
over  GF(  16).  The  Weierstrass  semigroup  at  P  is  determined 
in  [4].  Other  examples  of  asymptotically  optimal  towers  are 
provided  in  [l], 

II.  Results 

Let  q  be  the  power  of  a  prime  p  and  consider  the  G-S  tower 
of  function  fields  given  by  Xi  =  F?2  (xi)  and  for  n  >  2, 

xq 

Tn  =  T„-i(x„)  where  xqn  +  x„  =  1 — . 


1This  work  was  supported  by  the  National  Science  Foundation 
under  Grant  CCR-9714626. 


Let  P!x!'1  denote  the  unique  place  in  Tn  lying  above  Poo  and 
set  gx  :=  (x?_1  +  1), 

n 

•S  =  {!}  Si  I  0  <  e>  <  9  “  L  some  e-  /  0}. 

i=2 

The  main  result  can  now  be  stated. 

Theorem  1  Every  function  in  Tn  whole  poles  are  confined  to 
P&'1  can  be  expressed  as  a  linear  combination  of  functions  in 
the  set  S,  with  coefficients  of  the  form  p(x i)/x\ ,  where  p(x i) 
is  a  polynomial  in  xi  and  i  >  0. 

The  talk  will  provide  examples  as  well  as  other  results  re¬ 
lating  to  the  function  field  tower. 
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Abstract  —  We  propose  a  lower  bound  for  the  mini¬ 
mum  distance  of  [n,  fc]  linear  codes  which  are  specified 
by  generator  matrices  whose  rows  are  k  vectors  of  a 
given  sequence  of  n  linearly  independent  vectors  over 
a  finite  field.  Note  that  the  Feng-Rao  and  the  order 
bounds  give  lower  bounds  for  the  minimum  distance 
of  the  dual  codes. 

I.  Introduction 

Various  kinds  of  bounds  for  the  minimum  distance  of  linear 
codes  have  been  investigated  in  the  history  of  coding  theory. 
Among  them,  the  Feng-Rao  bound  is  one  of  the  most  distin¬ 
guished  [1]. 

Let  F  be  a  finite  field  and  n  a  positive  integer.  We  de¬ 
note  by  B  :=  (6i,62,  ... . ,  6„)  a  sequence  of  n  linearly  in¬ 
dependent  vectors  in  Fn.  For  u  —  (m ,  w2,  •  •  • ,  un)  and 
v  =  (vi,  t>2,  •  •  • ,  vn)  in  Fn ,  u  *  v  :=  (uiVi,  U2V2,  ■  ■  ■ ,  unv„). 

For  B  and  a  subset  G  of  B  with  |G|  =  k  (1  <  k  <  n),  we 
define  a  code  C(B,  G)  over  F  by  C(B,  G)  :=  span{6  :  b  €  G} 
and  denote  its  dual  by  C*~{B,G).  C(B,G)  (resp.  C±(B,G)) 
is  an  [n,  fc]  (resp.  [n,  n  —  fc])  linear  code.  We  denote  by  d(C) 
the  minimum  distance  of  a  linear  code  G. 

We  denote  by  Lt  (1  <  l  <  n)  the  linear  space  over  F 
spanned  by  61 , 62, . . . ,  6/  and  let  Lo  :=  {0}.  For  v  €  Fn  \  {0}, 
let  p{v)  denote  the  index  i  such  that  v  E  Le  \  Li~\  holds 
and  p(0)  :=  0.  A  pair  (bi,bj)  ( bi,bj  E  B )  is  said  to  be  well- 
behaving  (WB)  if  p(bu  *  bv)  <  p(bi  *  bj)  for  all  u  and  v  with 
1  <  u  <  i,  1  <  v  <  j  and  ( u ,  v)  +  (*,  J*)- 

Proposition  1  [2,  §4]  For  B  and  G ,  let 

At  :=  {(t,j)  :  p(bi  *  bj)  =  t  and  (bi,bj)  is  WB}, 

for  £  =  1,2,  ...,n  and  define  S(B,G)  min{|.4r|  :  be  E 
B\G}.  Then  d(Cx(B,G))  >  S(B,G).  □ 

S(B,G)  is  known  as  the  Feng-Rao  bound  for  d(CJ~(B,G)). 
In  this  paper,  we  introduce  a  lower  bound  for  the  minimum 
distance  of  C{B ,  G)  instead  of  C±(B,  G ),  by  using  the  map  p 
and  the  concept  of  well-behaving  as  in  Proposition  1 . 

II.  A  LOWER  BOUND  FOR  d{C(B,G)) 

Theorem  1  For  B  and  G,  let 

B'i  {£  :  p(bi  *bj)  =  t  for  some  bj  E  B 

s.t.  (bi,bj)  is  WB},  t  =  l,2,...,n 

and  Bi  :=  {v  :  5„  E  B  \  G}  \  B\.  Define  t(B,G)  :=  max{|Bi|  : 
b,  E  G}.  Then  d{C(B,  G))  >  n  -  k  +  1  -  t(B,  G).  □ 

This  theorem  follows  from  the  duality  theorem  of  general¬ 
ized  Hamming  weights  [6]  and  the  following  proposition. 

Proposition  2  Let  dt{C)  denote  the  t-th  generalized  Ham¬ 
ming  weight  of  the  code  G,  then  dt{C'L(B,  G))  =  k  +  t  for  all 
t  with  t(B,  G)  +  l<t<n  —  k.  □ 


This  proposition  was  first  shown  for  G  =  {bi , 62,  ,bk} 
[3,  Theorem  2]  while  it  is  shown  to  hold  for  an  arbitrary  subset 
G  of  B  with  |G|  =  k. 

For  given  B  and  an  integer  r,  let  G'  :=  {be  :  \Be_\  <  r}. 
Then  t(B,  G1)  <  r  and  therefore  d(C(B,  G'))  >  n  -  k  +  1  —  r 
by  Theorem  1.  Moreover  if  t(B,G')  =  t(B,G)  then  G  C  G' . 
Thus  if  t(B,  G')  =  r,  then  C(B,  G')  3  C(B,  G)  for  all  G  C  B 
with  t(B,  G)  =  t.  This  means  that  for  fixed  B  and  r,  the 
dimension  of  C(B,G')  is  |G'|  and  is  the  largest  among  all 
dimensions  of  codes  C(B,  G )  with  t(B,  G)  =  r.  This  idea  to 
define  G'  corresponds  to  the  improved  geometric  Goppa  codes 
for  CX(B,  G)  [2,  §4.3]. 

III.  Applications 

For  Reed-Solomon  and  Reed-Muller  codes,  we  can  show  that 
Theorem  1  gives  the  true  minimum  distance  [3,  4]. 

For  one  point  algebraic  geometry  (AG)  codes  on  Cab 
curves  [5],  if  a  Cab  curve  is  non-singular  and  absolutely  ir¬ 
reducible,  then  we  Can  show  that  t(B,G)  <  g  [3]  where  g 
is  a  genus  of  the  Cab  curve,  and  B  and  G  Eire  determined 
so  that  C(B,G)  becomes  an  L-type  AG  code  on  the  Cab 
curve.  Since  an  L-type  AG  code  is  an  [n,  k,  d\  code  with 
d  >  n  —  k  +  1  —  g  =:  d*  [2,  Theorem  2.65],  this  result  im¬ 
plies  that  the  lower  bound  given  in  Theorem  1  is  better  than 
d*. 

For  evaluation  codes  [2,  §4],  a  lower  bound  based  on  the 
weight  function  has  been  investigated  [2,  §5].  When  the  one 
point  AG  codes  on  Cab  curves  considered  above  are  regarded 
as  evaluation  codes,  this  bound  is  equal  to  d*  and  therefore 
the  proposed  bound  is  better.  For  other  evaluation  codes, 
relations  between  the  two  bounds  are  left  for  further  study. 
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Abstract  —  We  give  lower  bounds  on  the  state  com¬ 
plexity  of  geometric  Goppa  codes.  For  Hermitian 
codes  we  calculate  the  DLP  bound,  V,  and  determine 
when  V  is  tight  and  when  it  is  not. 

I.  Introduction 

Geometric  Goppa  codes  (also  called  algebraic-geometric 
codes)  are  a  family  of  powerful  codes  that  can  be  longer  than 
Reed-Solomon  codes.  Hermitian  codes  axe  a  particularly  good 
family  of  geometric  Goppa  codes.  State  complexity  (SC)  is 
used  as  a  measure  of  the  complexity  of  soft-decision  decoding 
algorithms,  such  as  the  Viterbi  algorithm.  The  SC  of  a  code 
varies  with  coordinate  orders.  We  refer  to  the  lowest  SC  over 
all  coordinate  orders  as  absolute  state  complexity  (ASC).  Ac¬ 
cording  to  Massey,  determining  an  order  that  achieves  ASC  is 
‘the  art  of  trellis  decoding’.  The  DLP  bound,  V,  is  an  order- 
free  lower  bound  on  SC.  In  particular,  if  an  order  attains  V 
then  it  achieves  ASC. 

II.  On  the  SC  of  Geometric  Goppa  Codes 

Our  notation  and  terminology  for  geometric  Goppa  codes 
follow  Stichtenoth’s  book.  We  fix  a  function  field  F/Fq  of 
genus  g  and  an  [ n,k ]  geometric  Goppa  code  Cc(D,G)  from 
F/Fq ,  where  D  =  ]T )"=i  ■  The  abundance  of  Cc{D,G)  is 
a  =  dim(G  —  D).  The  usual  expression  for  state  space  di¬ 
mension  at  depth  i  in  terms  of  dimensions  of  past  and  future 
truncated  codes  becomes 

Si(Cc{D,  G))  =  k  +  2a  —  dim(G  -  D~)  -  dim(G  -  D+)  (1) 

where  D~  =  pi>  Dt  =  D  -  D~  and  0  <  i  <  n.  As 
usual  s(Cc{D,G))  =  ma x0<i<n{si(Cc(D,G))}. 

Almost  immediately  from  (1)  we  get  that  the  SC  of 
Cc(D,G )  reaches  Wolf’s  upper  bound,  min{A:,n  —  k},  if 
deg  G  <  pA  or  deg  G  >  +  2 g.  For  the  rest  we  assume 

that  <  degG  <  +  g.  (The  results  for  2L=T  +  g  < 

deg  G  <  pp  +  2 g  follow  by  duality.)  A  first  lower  bound  on 
SC  can  be  deduced  from  Clifford’s  Theorem. 

Result  1  (Clifford  Bound)  s(Cc{D,G))  >  fc+2a-deg  G+ 

r^i-  . . 

Two  other  lower  bounds  can  be  derived  in  terms  of  the 
gonality  sequence,  (gk)k> i,  of  F/Fq.  The  gonality  sequence 
is  known  from  g  and  the  degree  of  Fj IF,,  [2],  One  of  these 
bounds  is  derived  from  a  known  bound  on  the  generalised 
weight  hierarchy  of  F/Fq,  [1,  3];  the  other  is  derived  directly 
from  (1).  Often  the  two  bounds  are  equal  and  then 

Result  2  (GS  Bound)  s(Cc(D,  G))  >  max0<i<n{fc  +  2o  - 
|{r  :  gr  <  degG  -  i}|  -  |{r  :  gr  <  deg G  +  i-  n}|). 
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Example  3  When  F/Fq  is  hyperelliptic  the  Clifford  bound 
and  the  GS  bound  agree. 

III.  Towards  the  ASC  of  Hermitian  Codes 

Hermitian  codes  are  defined  from  the  Hermitian  function 
field,  H/Fq2  of  genus  g  —  (’).  For  Hermitian  codes  n  =  q3  and 
G  =  mQoo,  where  Qoo  is  the  place  of  degree  one  at  infinity. 
We  put  Cm  =  Cc{DtmQoo)  and  km  =  dim(Cm)-  We  are 
interested  in  2^1  <  m  <  2^2  +  g.  By  results  of  [2,  3],  the 
DLP  bound  of  an  Hermitian  code  is  equal  to  the  GS  bound. 

Result  4  (DLP  Bound  for  Hermitian  Codes)  Withn  — 
2m  +  4g  +  q  —  2  =  uq  +  v,  where  0  <  v  <  q  —  1, 

V(C-)  -  *-  ('  ■ -211 ') -  (5 V*1)  -™  {« -  fll  ■  1 - »} 

In  some  cases  we  can  improve  on  the  DLP  bound.  We  write 
m  =  |^-yj  q  +  M'{q  +  1)  +  M°,  where  0  <  M°  <  q. 

Result  5  With  qi=q  mod  2,  s(Gm)  -  V(Cm)  is  at  least 

1  +  AT  +  M°  -  L|J  if  If  J  -  AT  <  M°  < 
m  -  M°  if  a=fL  <  M°  <  [4=2) 

1  +  M'+M°-q  ifq-M*<M°<q-^A 

1  +q-q2-M°  ifq-Ml<M0<q-q2. 

In  pcurticular  the  DLP  bound  cannot  be  tight  if  [2J  —M*  < 
M°  <  or  q  -  AT  <  M°  <  q  -  q2. 

We  have  found  a  coordinate  order  on  Cm  that  achieves  the 
DLP  bound  whenever  this  is  not  ruled  out  by  Result  5.  Thus 
this  determines  exactly  when  the  DLP  bound  for  the  SC  of 
Hermitian  codes  is  tight. 

However,  when  the  DLP  bound  is  not  tight,  the  coordi¬ 
nate  order  does  not  always  achieve  the  bound  of  Result  5. 

In  these  cases  we  have  not  ascertained  the  ASC  of  Cm.  The 
first  values  of  m  for  which  this  is  the  case  are  q  —  5  and 
m  =  70,  q  =  7  and  m  G  {182,189,190}  and  q  =  8  and 
m  6  {268,  272,  276,  277,  280,  281}. 

This  work  was  supported  by  EPSRC  grant  L88764. 
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Abstract  —  We  investigate  iterative  decoding  and 
channel  estimation  for  multiple-access  channels.  Re¬ 
sults  are  obtained  concerning  the  fixed  points  of  such 
iterations. 

I.  Iterative  Receiver  Principle 

In  [1]  an  iterative  receiver  was  proposed  for  the  linear  multiple 
access  channel.  We  now  consider  an  approach  for  integration 
of  channel  estimation  into  this  technique  whereby  we  use  the 
a-posteriori  probabilities  of  the  information  symbols  as  uncer¬ 
tain  training  sequences  for  the  purposes  of  channel  estimation. 
We  investigate  the  properties  of  fixed  points  of  such  iterations. 

Let  S  be  a  vector  space.  Unconstrained  sequences  can 
take  any  value  u  €  S  as  opposed  to  constrained  sequences 
x  €  C  C  <S.  We  are  interested  in  low-complexity  joint  detec¬ 
tion  (or  estimation)  for  sets  of  constrained  sequences  observed 
according  to  known  transition  probabilities.  These  probabil¬ 
ities  are  defined  by  some  combination  of  deterministic  map¬ 
pings  (e.g.  linear  combining)  and  non-deterministic  perturba¬ 
tions  (e.g.  noise). 

Suppose  that  the  sequences  Xk,  k  =  1, . . .  ,n  are  each  pro¬ 
duced  by  a  mapping  Ck  of  an  unconstrained  sequence  Uk  ■  The 
random  sequence  y  is  observed  according  to  pipy  |  x\, . . . ,  xn). 
The  Uk  may  or  may  not  be  independent,  but  are  condition¬ 
ally  dependent  given  y.  This  model  can  be  thought  of  as 
a  multiple-access  communications  system  (the  Xk  are  coded 
information  sequences),  but  is  rich  enough  to  describe  other 
systems  of  interest,  such  as  inter-symbol  interference  channels 
(by  allowing  some  of  the  Xk  to  represent  the  sequence  of  chan¬ 
nel  taps,  obeying  known  spectral  constraints)  and  space-time 
diversity  channels. 

Optimal  detection  means  the  determination  of  either  the 
posterior  density  p(ui,  u2, . . . ,  un  \  y),  or  its  marginals,  taking 
into  account  the  constraints.  This  is  usually  an  NP-complete 
problem  and  we  propose  a  reduced  complexity  iterative  algo¬ 
rithm.  The  basic  principle  that  we  propose  for  design  of  such 
algorithms  may  be  stated  concisely  as  follows. 

1.  Incorporate  dependence,  ignore  constraints. 

2.  Incorporate  constraints,  ignore  dependence. 

We  iteratively  update  the  distributions  pk{uk)-  Ideally  pk  con¬ 
verges  over  iteration  to  the  k- th  marginal  of  the  true  posterior 
distribution  p(u\,  U2,  ■  ■  ■ ,  un  \  y).  The  principle  also  applies 
to  estimation  problems,  in  which  case  the  distributions  are 
replaced  with  the  current  estimates,  which  we  hope  converge 
to  some  desired  estimator  e.g.  MMSE. 

Let  p  =  {pi(ui),p2(u2),  • . .  ,pn{un)}  be  the  sequence  pri¬ 
ors.  At  the  conclusion  of  any  iteration  step,  the  unconstrained 
joint  detector,  using  as  priors  the  current  set  of  marginal  dis¬ 
tributions  p,  produces  a  new  set  p+,  taking  into  account  only 
the  conditional  dependencies.  All  the  constraints  are  relaxed. 
This  results  in  a  p+  that  may  place  mass  on  “impossible" 


events.  Relaxation  of  (especially  integer)  constraints  can  re¬ 
sult  in  low- complexity  heuristics.  An  example  of  this  is  ap¬ 
plying  the  decorrelator  or  MMSE  filter  for  detection  with  a 
linear  model  with  integer  constraints. 

A  bank  of  constrained  detectors  ignores  the  inter¬ 
dependencies  between  the  Uk .  The  detector  for  u*  updates 
the  current  prior  marginal  pk  based  on  the  constraint  Ck  and 
p(y  |  Uk)-  For  convolutionally  coded  data,  we  may  use  the 
forward-backward  algorithm.  For  a  sequence  of  channel  taps 
we  may  use  a  Kalman  filter. 


II.  Convergence  Analysis 

We  shall  now  consider  an  asynchronous  K  user  CDMA  system 
in  the  absence  of  multipath  fading.  Identical  convolutional 
codes  with  free  distance  dfree  are  used  by  each  transmitter. 

We  are  interested  in  the  effective  noise  variance  at  the  out¬ 
put  of  each  iteration.  Considering  an  input  noise  variance  v 
to  the  constrained  data  estimator  (Viterbi  decoder),  we  may 
bound  the  output  noise  variance  Vd- 


Vd  ^  /(v)  —  4dfreeQ 


For  a  given  spreading  factor  (3  —  K/N ,  input  variance  Vd  and 
thermal  noise  variance  cr2,  the  unconstrained  joint  detector 
described  in  [1]  is  characterized  by  Vd  —  pv  +  <r2.  This  leads 
to  the  recurrence 


v 


(m+l) 

d 


=  F><m))  =  4 dheeQ 


In  operating  regions  of  interest,  we  may  use  the  solutions  to 
the  fixed  point  equation  Vd  =  F(vd)  to  accurately  predict  the 
performance.  Furthermore  v ^  may  be  used  to  predict  the 
performance  for  finite  number  of  iterations,  m. 

Given  a  fixed  point  solution  x,  we  have  the  following  suffi¬ 
cient  condition  for  stability 


„  1  (  2  dfr, 

0  <  x  <  -j  — — — ^ 

P  V  3  In  dfree 


F'(x)  <  1. 


In  practice,  we  have  observed  the  existence  of  a  stable  fixed 
point  close  to  the  single  user  operating  point  and  it  is  possible 
to  derive  an  expression  for  the  loss  compared  to  single  user  for 
this  point.  Decoder  failure  occurs  when  a  second  fixed  point 
appears  at  high  noise  variance.  It  can  be  shown  that  for  high 
SNR  this  occurs  for  a  critical  value  of  /?  given  by 


fcrit  =  (2-u)/r1(2). 


We  have  verified  this  behavior  with  simulations. 
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Abstract  —  Parallel  concatenated  convolutional 
codes  (PCCC’s)  are  usually  constructed  using  sys¬ 
tematic  recursive  convolutional  codes  (SRCC’s)  as 
constituent  codes.  In  this  paper,  we  introduce  a 
new  version  of  parallel  concatenation  that  uses  non- 
systematic  recursive  convolutional  codes  (NSRCC’s) 
as  constituent  codes.  A  systematic  constituent  code 
then  becomes  a  particular  case  of  this  general  scheme. 
The  use  of  this  larger  class  of  constituent  codes  en¬ 
hances  the  number  of  possible  codes  in  the  search 
space,  thus  allowing  the  possibility  of  finding  better 
codes.  We  also  introduce  a  modified  iterative  decod¬ 
ing  method  for  this  more  general  form  of  parallel  con¬ 
catenation.  The  decoding  technique  is  no  more  com¬ 
plex  than  the  standard  iterative  decoding  algorithm. 

I.  Introduction 

The  usual  view  of  parallel  concatenation  is  two  systematic 
recursive  convolutional  codes  (SRCC’s)  linked  by  an  inter¬ 
leaver  [1],  The  systematic  bits  that  are  identical  to  both  con¬ 
stituent  codes  are  transmitted  only  once,  and  the  two  decoders 
“share”  the  noisy  received  systematic  symbols.  Iterative  de¬ 
coding  is  then  accomplished  by  exchanging  extrinsic  reliabil¬ 
ity  information  about  the  systematic  bits  between  the  two 
decoders. 

In  this  paper  we  propose  a  class  of  parallel  concatenated 
convolutional  codes  (PCCC’s)  that  uses  non-systematic  recur¬ 
sive  convolutional  codes  (NSRCC’s)  as  constituent  codes  for 
PCCC’s.  This  class  of  NSRCC’s  contains  the  usual  SRCC’s 
as  a  particular  case.  We  also  propose  a  modified  iterative 
decoding  method  for  these  more  general  PCCC’s. 

We  define  an  NSRCC  as  a  convolutional  code  with  gen¬ 
erator  matrix  [ m{D)/d3{D )  rn(D) / dz(D)\.  (Note  that  this 
NSRCC  becomes  systematic  if  713(D)  =  d${D)  or  714(D)  = 
*(D).) 

We  now  propose  a  PCCC  scheme  as  shown  in  Fig.  1.  The 
first  constituent  code  is  a  rate  1/2  NSRCC  in  the  previously 
described  form  and  the  second  constituent  code  is  similar  to 
the  usual  PCCC’s,  i.e.,  the  systematic  bits  are  not  transmit¬ 
ted. 

The  block  diagram  of  the  decoder  is  shown  in  Fig.  2.  For 
each  a  posteriori  probability  (APP)  decoder  we  use  the  stan¬ 
dard  BCJR  [2]  algorithm. 

When  7/i  or  7/2  is  a  systematic  bit,  this  algorithm  gives  a 
result  that  is  identical  to  the  classical  PCCC  iterative  decod¬ 
ing  algorithm.  In  the  classical  PCCC  iterative  decoding  algo¬ 
rithm,  besides  7/3  and  the  extrinsic  a  priori  likelihood  ratios 
of  the  systematic  bits  provided  by  APP1,  APP2  also  received 

1This  work  was  supported  by  NSF  Grants  NCR95-22939  and 
NCR96-96065  and  NASA  Grants  NAG5-557  and  NAG5-8355. 
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Figure  1:  A  general  PCCC. 

noisy  systematic  symbols  (t/i  or  y 2 )  as  inputs.  For  the  new 
decoder  shown  in  Fig.  2,  the  received  noisy  systematic  sym¬ 
bols  are  included  in  the  information  sent  from  APP1  to  APP2 
when  the  code  is  systematic.  The  feedback  from  APP2  to 
APP1,  however,  is  identical  to  classical  PCCC  iterative  de¬ 
coding,  i.e.,  only  “extrinsic”  likelihood  ratios  are  sent  in  this 
case. 


Figure  2:  Block  diagram  of  the  decoder. 

Because  we  have  lifted  the  restriction  of  using  SRCC’s  as 
constituent  codes,  the  number  of  possible  constituent  codes 
is  now  much  larger  than  for  classical  PCCC’s.  We  have  at¬ 
tempted  a  limited  search  for  good  PCCC’s  using  NSRCC’s 
as  constituent  codes.  A  PCCC  with  generator  matrices 
[1  +  D/l  +  D  +  D2  1  +  D  +  D3/1  +  D  +  D2]and  [1  +  D3+D4/1  + 
D  +  D2]  has  been  identified  as  a  good  choice.  Its  BER  per¬ 
formance  is  nearly  the  same  as  the  original  turbo-code  in  [1] 
for  an  information  block  length  of  1024  and  rate  1/3.  Note, 
however,  that  this  non-systematic  code  has  a  smaller  state 
complexity. 
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Abstract  —  The  concept  of  iterative  decoding  of 
concatenated  channel  codes  is  applied  to  joint  source- 
channel  decoding  (JSCD).  Extrinsic  information  from 
the  soft-in/soft-out  channel  decoder  is  used  as  a-priori 
information  for  the  new  soft-in/soft-out  source  de¬ 
coder  and  vice  versa.  In  this  novel  iterative  approach 
the  redundancies  within  the  data-bits  and  the  chan¬ 
nel  codewords  are  alternately  exploited  in  order  to 
approximate  the  highly  complex  optimal  JSCD. 

Iterative  Source-Channel  Decoding 

Consider  the  problem  of  transmitting  a  set  of  M  signal- vectors 

,  X^M  at  each  time  k  (fig.  1).  The  vectors  are  source- 
encoded  (quantized)  by  the  indices  ljf\  j  =  1  The 

index-bits  are  interleaved,  commonly  channel-encoded,  and 
the  codewords  14  are  transmitted.  Since  source  encoding  is 
never  “perfect”,  some  dependencies  (modeled  by  first-order 
Markov-processes)  remain  between  adjacent  indices  lj/2 j, 

The  basic  idea  of  iterative  source- channel  decoding  is 
adopted  from  iterative  channel  decoding  [1]:  The  redundan¬ 
cies,  which  are  contained  in  the  channel  codewords  and  in  the 
source-encoder  indices,  are  alternately  exploited  by  separate 
soft-in/soft-out  decoders  (SISO  decoders).  Each  SISO  decoder 
computes  the  new  (extrinsic)  part  of  information  on  the  data- 
bits,  which  is  based  only  on  one  type  of  redundancy.  The 
extrinsic  information  is  forwarded  to  the  other  SISO  decoder 
as  a-priori  information.  This  process  is  iteratively  repeated  to 
improve  the  reliability  of  the  index-bits  step  by  step.  A  block- 
diagram  of  such  an  iterative  source-channel  decoder,  which 
directly  fits  into  figure  1,  is  depicted  in  figure  2. 


Fig.  1:  Transmission  system 

The  SISO  channel  decoder  (e.g.  BCJR-algorithm  [1],  [2]) 
processes  a-priori  information  (rr^Ik))  of  the  index-bits, 


the  received  channel-L-values  Lc  ■  14  of  the  transmitted  bits, 
and  it  computes  the  output  L-values  Z/c*  (n//*,))  and  the  ex¬ 
trinsic  information  LiC\n(I k))  for  the  index-bits.  The  latter 
contains  the  new  part  of  information  that  has  been  computed 
by  only  exploiting  the  redundancies  within  the  channel  code. 

SISO  source  decoding  is  performed  by  Optimal-Estimation 
(OE)  [3].  The  channel- values  Ik  of  the  index-bits,  the  a-priori 
information  LiS\lk),  which  equals  the  extrinsic  information 
Le°\lk)  from  the  channel-decoder,  the  transition  probabili¬ 
ties  of  the  Markov-models  and  the  a-posteriori  probabilities 
(APPs)  from  the  previous  time  fc  —  1  are  processed  in  order 
to  compute  the  APPs  of  all  possibly  transmitted  indices  at 
time  k  by  the  recursion  given  e.g.  in  [3].  The  APPs  are  used 
to  estimate  the  receiver-outputs  X^  after  the  last  iteration. 
Within  the  iterations  the  SISO  decoder  for  a  binary  channel 
code  requires  a-priori  informations  for  single  bits.  Since  OE 
computes  APPs  of  indices  a  conversion  has  to  be  carried  out 
to  L-values  for  the  bits ,  which  can  be  realized  by  summing  up 
the  APPs  over  all  possible  indices  having  a  “1”  or  “0”  at  the 
bit-position  under  consideration.  The  L-values  of  the  index- 
bits  are  converted  to  the  output- L-values  Z/S)(J*)  of  the  SISO 
source  decoder  by  multiplexing,  and  the  extrinsic  information 
Lis\ik)  is  computed.  It  is  interleaved  and  forwarded  to  the 
SISO  channel  decoder  as  the  a-priori  information  LiC^(7r(/*)). 

Simulation  results  show  that  the  iterative  source-channel 
decoding  works  better  than  the  non-iterative  sequential  chan¬ 
nel  and  source  decoding  with  the  same  component  algorithms. 
A  gain  of  about  1  dB  in  E), /No  is  achieved  by  only  two  iter¬ 
ations  on  moderately  corrupted  channels  at  the  same  quality 
of  transmission. 
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Fig.  2:  Iterative  source-channel  decoding.  Information  is  passed  by  log-likelihood  ratios  for  single  bits  [2]  being  grouped  into  vectors. 
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Abstract  —  This  paper  presents  novel  decoding  algorithms  for 
turbo  codes,  in  which  the  likelihood  and  channel  values  are  updated 
in  order  for  those  values  to  become  closer  to  the  true  values  thorough 
the  iterative  decoding  procedure.  The  criteria  for  updating  the 
likelihood  and  channel  values  are  proposed,  those  are  based  on  the 
simple  means  to  compare  the  interim  hard  decision  results  from  each 
of  component  decoders. 


I.  INTRODUCTION 

Parallel  concatenated  convolutional  (turbo)  codes  and  iterative 
decoding  achieve  error  performances  close  to  Shannon  limit'  . 
The  principle  of  the  iterative  decoding  is  that  the  component 
decoders  exchange  their  outputs  of  the  likelihood  values  each 
other,  and  update  them  though  the  iterative  procedures.  In  the 
conventional  decoding  algorithms,  only  the  likelihood  values  for 
systematic  parts  are  treated  to  be  updated11,21. 

Here  we  propose  novel  decoding  algorithms  which  update 
both  likelihood  and  channel  values  based  on  the  interim  hard 
decision  results  in  order  to  minimize  the  effects  of  the  error 
contained  in  those  values. 

II.  THE  ALGORITHMS 

The  iterative  decoder  in  accordance  with  the  proposed 
algorithm  is  shown  in  Fig.  1 .  .  Since  the  likelihood  values  show 
the  log-likelihood  ratio  (LLR)  of  the  decoded  digit,  the  hard 
decision  results  of  LLR  values  will  be  the  final  decoded  results. 
Therefore,  if  the  signs  of  LLR  values  output  from  the  component 
decoders  are  different  each  other,  it  is  obvious  that  either  of  them 
contains  error.  Based  on  this,  when  an  error  is  detected  on  LLR 
values  (Li  or  Li),  it  should  be  updated  more  reliably.  We  here 
take  a  simple  method  to  update  LLR  values  by  averaging  them,  as 
shown  in  Fig.  2.  This  makes  the  absolute  value  of  updated  LLR 
smaller,  thus  the  effects  of  LLR  errors  can  be  minimized. 

The  systematic  part  of  the  channel  value  U  is  similarly 
updated  by  comparing  it  with  the  corresponding  LLR  values. 
Further,  regarding  the  updating  of  redundant  part  of  the  channel 
values  Yj,  Y2,  the  LLR  values  are  re-encoded  in  soft  value  (using 
log-likelihood  addition)  to  be  compared  with.  Then  they  are 
compared  to  detect  errors,  and  updated  in  the  same  way. 

III.  PERFORMANCE  EVALUATION 

The  performance  of  the  proposed  algorithm  is  evaluated  with 
the  parameters  specified  in  IMT-2000  draft'31,  i.e.,  r=l/3,  K=3, 
multi-stage  interleaver,  etc.  The  results  are  shown  in  Figs.  3 
and  4,  where  SOVA12'  based  component  decoders  are  employed. 

It  is  shown  that  the  proposed  algorithm  improves  BER 
performance  as  iterations  go  on.  Also,  the  achievable  BER 
limit  is  improved  by  the  effective  updating  method.  Moreover, 
it  is  possible  with  the  proposed  algorithm  to  reduce  the  decoding 
process  time  by  stopping  the  iterations  much  earlier. 

IV.  CONCLUSIONS 

Novel  decoding  algorithms  for  turbo  codes  to  update  the 
likelihood  and  channel  values  based  on  the  interim  hard  decision 
results  have  been  presented.  By  updating  those  values  more 
reliably,  the  proposed  algorithms  improve  BER  performance,  and 
reduce  the  iterations. 


Fig.  1  Iterative  decoder  updating  LLR  and  channel  values 


Fig. 2  Likelihood  value  updating  (ex.  following  Dec  1) 


0  1  2  3  4  5 
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Fig. 3  Bit  error  rate  as  a  function  of  Eh/N0  (AWGN) 
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Fig. 4  Bit  error  rate  as  a  function  of  iterations  (AWGN) 
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Abstract  —  The  high-rate  squared-error  distortions 
of  a  balanced  multiple  description  lattice  vector  quan¬ 
tizer  are  analyzed  for  a  memoryless  source  with  prob¬ 
ability  density  function  p,  differential  entropy  h{p)  < 
oo,  and  lattice  codebook  A.  For  any  a  €  (0, 1)  and  rate 
pair  (R,R),  it  is  shown  that  the  two-channel  distor¬ 
tion  do  and  the  channel  1  (or  channel  2)  distortion  ds 
satisfy 

lim  do 22R(1+a)  =  G( A)22h(p)/4 

ft— eoo 

and 

lim  d322R[l~a)  =G{Sl)22Hv\ 

R — ►  oo 

where  G( A)  is  the  normalized  second  moment  of  a 
Voronoi  cell  of  the  lattice  A  and  G(Sl)  is  the  nor¬ 
malized  second  moment  of  a  sphere  in  L  dimensions. 


I.  Introduction 

We  consider  a  two-channel  multiple  description  quantiza¬ 
tion  system  for  a  discrete-memoryless  source  with  differential 
entropy  h(p).  The  quantizer  transmits  information  on  each 
channel  at  rate  R  bits/sample.  The  mean-squared  error  when 
both  channels  work  is  denoted  by  do  and  when  either  channel 
works  is  denoted  by  ds. 

It  has  been  shown  [1]  that  for  a  uniform  entropy-coded  mul¬ 
tiple  description  quantizer  and  any  a  6  (0, 1)  the  distortions 
satisfy 


lim  d0(R)22R{1+a) 

ft— fOO 

lim  d3{R)22R(1~a) 

ft— ►  OO 


1  /  22h(p)  \ 
4 


(1) 


On  the  other  hand,  by  using  a  random  quantizer  argument  it 
was  shown  [2]  that  by  encoding  vectors  of  infinite  block  length, 
it  is  possible  to  achieve  distortions 


lim  d0(R)22R(1+a) 

ft— kOO 

lim  ds(R)22R(1~a) 

ft— too 


(2) 


Thus  in  multiple  description  quantization  it  is  possible  to 
achieve  a  reduction  in  the  granular  distortion  by  1.53  dB,  si¬ 
multaneously  for  the  two-channel  and  the  side  distortion. 

The  goal  of  this  paper  is  to  analyze  constructions  given 
in  [3]  for  closing  this  “1.53  dB”  gap.  The  system  to  be  an¬ 
alyzed  is  illustrated  in  Fig.  1.  Our  approach  is  as  follows. 
From  classical  quantization  theory,  we  know  that  the  gap  be¬ 
tween  scalar  quantization  and  the  rate  distortion  bound  may 


channel  2 


Figure  1:  A  multiple  description  vector  quantizer  with  a  lat¬ 
tice  codebook. 


be  closed  by  using  vector  quantizers  with  lattice  codebooks. 
Certainly,  by  following  this  approach  we  can  also  close  the 
gap  between  thfe  two-channel  distortion  and  the  rate-distortion 
bound.  In  particular,  this  will  allow  us  to  replace  the  factor 
(1/12)  in  the  expression  for  do  in  (1)  with  G(A),  the  normal¬ 
ized  second  moment  of  the  Voronoi  region  of  a  lattice  point. 
The  main  question  we  address  here  is  that  of  simultaneously 
reducing  d\ .  How  can  such  a  reduction  be  achieved  and  what 
is  the  quantity  that  will  replace  the  factor  (1/12)  in  the  ex¬ 
pression  for  d\  in  (1)?  We  will  show  through  a  constructive 
procedure  that  the  distortion  di  can  be  reduced  by  solving  a 
specific  labeling  problem.  To  our  surprise,  the  quantity  that 
replaces  (1/12)  is  G(Sx),  the  normalized  second  moment  of  a 
sphere  in  L  dimensions. 

For  details  the  reader  is  referred  to  the  full  paper  [4],  which 
will  be  published  elsewhere. 
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Abstract  —  We  study  the  problem  of  finding  the 
optimal  overcomplete  (frame)  expansion  and  bit  allo¬ 
cation  for  multiple  description  quantization  of  a  Gaus¬ 
sian  signal  at  high  rates  over  a  lossy  channel. 

I.  Introduction 

The  setup  is  shown  in  Figure  1.  In  multiple  description 
quantization  using  overcomplete  (frame)  expansions  [1,  2],  an 
input  signal  x  e  R*  is  represented  by  a  vector  y  =  Fx  e  RiV , 
N  >  K.  F  is  a.  N  x  K  matrix,  called  the  frame  operator.  It 
is  assumed  any  K  rows  of  F  span  RK .  The  coefficients  of  y 
are  scalar  quantized  to  obtain  y,  and  are  then  independently 
entropy  coded  using  on  average  a  total  of  R  bits  allocated 
among  the  N  coefficients.  In  channel  state  a,  the  decoder 
receives  Nr, ,  <  N  coefficients  after  potential  erasures,  and 
reconstructs  the  signal  x  from  the  received  coefficients.  The 
number  of  channel  states  is  2N  since  each  coefficient  can  be 
either  received  or  lost.  For  a  given  distribution  over  channel 
states,  we  wish  to  find  the  frame  operator  F  and  the  bit  alloca¬ 
tion  for  the  transform  coefficients  that  minimizes  the  expected 
squared  error  D  =  E[\\x  —  x||2]  subject  to  a  constraint  on  the 
average  rate  R,  for  asymptotically  large  R  and  Gaussian  x. 

II.  Analysis 

Without  loss  of  generality,  assume  that  x  is  distributed 
with  zero  mean  and  diagonal  covariance  matrix  R*m  = 
diag(«7o,  ...,4_i)  (else  can  use  KLT).  Let  q  =  y  -  y  be  the 
quantization  error  and  let  e  =  x  —  x  be  the  reconstruction 
error.  At  high  rate,  assume  q  is  distributed  with  zero  mean 
and  diagonal  covariance  matrix  with  £l[||gi||2]  =  c<Ty. 2~2R', 
where  c  =  7re/6  if  entropy  coded  uniform  scalar  quantization 
is  used.  The  distortion  can  be  written  as  D  =  P»D» ,  where 
D,  =  £7[||e||2|S  =  a],  and  p,  is  the  probability  of  the  channel 
being  in  state  s.  Let  yT  S  denote  the  Nr,,  dimensional  vector  of 
received  coefficients.  Let  Fr,,  be  a  Nr,a  x  K  matrix  consisting 
of  rows  of  F  corresponding  to  the  received  coefficients. 

To  obtain  an  expression  for  D, ,  there  are  two  cases  to  con¬ 
sider:  Nr,,  >  K  and  Nr,,  <  K .  When  Nr,,  >  K,  the  decoder 
has  enough  information  to  localize  the  input  vector  to  a  finite 
cell.  Although  the  actual  reconstruction  will  use  a  consistent 
reconstruction  [1,  3],  for  analysis  purposes,  we  use  the  optimal 
linear  reconstruction  as  x  =  F+,yr,3,  where  F+  is  the  pseudo¬ 
inverse  of  F.  Since  x  =  F^,,yT3,  the  conditional  distortion 
can  be  written  as  D,  =  F?[||e||2|S  =  a]  =  J5[||F+,qr ,||2]. 
When  Nr,,  <  K,  then  there  is  not  enough  information  to 
localize  x  to  a  finite  cell.  In  particular  x  is  bounded  in 
Nt,,  dimensions  and  unbounded  in  K  —  Nr,,  dimensions. 
Thus,  x  =  F+,yr +  (jF^,)Ty^3,  where  the  rows  of  F^r,, 
form  an  orthonormal  basis  for  the  subspace  orthogonal  to 
the  span  of  the  rows  of  Fr,,  and  y^s  is  a  if  -  Nr,,  di¬ 
mensional  vector.  Now  the  optimal  linear  reconstruction  is 
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Fig.  1:  System  setup. 


Fig.  2:  Results  for  optimal  3x2  expansion:  (a)  6i  (b)  ipf  for  loss 
probabilities  of  0.2  (top)  and  0.95  (bottom). 

*  =  Ft,,Vr,.  +  (Fr,,)T E[Vr„\Vr,,  =  VrJ  which  giveS  a  dis¬ 
tortion  of D,  =  £[||f’+Jgr,,||2]  +  £[||yrJ;,||2|pr,4  =  VrJ.  Since 
the  source  is  Gaussian,  S[||i/^3  l|2|l/r,J  can  he  easily  computed. 

Using  the  equations  for  D,  and  the  fact  that  E[qqT\  is 
diagonal,  the  portion  of  distortion  that  can  be  minimized 
by  bit  allocation  can  be  written  as  Db  =  EcLo*  OtO’yi2_2Ri, 
where  on  is- a  function  of  the  transform  F,  the  channel  state 
probabilities  p,,  and  the  quantization  constant  c.  Let  Dnb 
be  the  remaining  portion  of  the  distortion  D.  Minimizing 
Db  is  a  classic  bit  allocation  problem  with  solution  given  by 
Ri  =  R/N  +  logger2. /(n^V  ajO’lj)l,N)l 2.  This  gives  an 
optimal  Db  of  D'b  =  N{\[^  arf,)1'" 2~™/N .  To  find  the 
optimal  transform,  we  have  to  minimize  Dl  +  Dnb.  Since  it 
is  hard  theoretically,  we  use  numerical  gradient  descent  tech¬ 
niques  by  varying  one  coefficient  at  a  time. 

Results  show  that  at  high  loss  rates  Dnb  is  the  dominat¬ 
ing  term  which  is  minimized  by  repeating  the  coefficient  with 
highest  variance.  At  low  loss  rates,  Dl  is  the  dominating  term 
which  is  minimized  by  the  optimal  source  coder.  Results  are 
shown  for  3  x  2  expansion  in  Figure  2,  where  the  values  for 
0i  =  ttm_1(Fii/Fio),  i  =  0,1,2  are  plotted  with  rate  con¬ 
straint  R  =  6  bits  and  variances  al  =  4  and  cr\  =  1.  Also 
shown  is  ipit  which  is  the  :th  row  of  matrix  F. 
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We  consider  the  design  of  vector  quantizers  for  diversity- 
based  communication  over  two  channels  of  capacities  Ri  and 
R2  with  possibly  differing  failure  probabilities.  A  Multiple  De¬ 
scription  Vector  Quantizer  (MDVQ)  maps  an  n-dimensional 
source  vector  x  to  n-dimensional  code  vectors  £y,  x\  and  x2 
from  the  code  books  X°,  X1  and  X2  respectively.  We  use  the 
notation  of  [1],  Let  d(x,y)  be  a  single-letter  distortion  mea¬ 
sure,  and  random  vectors  X  and  Xm  ,m  =  0,1,2  represent 
the  source  and  decoder  outputs  respectively.  For  given  values 
of  Ri,  R2,  Ai  and  A  2  we  then  wish  to  find  an  MDVQ  which 
minimizes  the  average  distortion  cost 

D  =  E{d(X,  1°)}  +  Ai  E{d(X,  A'1)}  +  A  2E{d(X,  A2)}. 

We  shall  assume  d(x,y)  =  ||x  —  2/||2.  Note  that  Ai  and  A2  may 
be  interpreted  as  the  channel  failure  probabilities. 

The  problem  of  finding  the  rates  asymptotically  achievable 
by  MDVQs  of  very  large  dimensions  is  only  partially  solved. 
For  references  to  the  extensive  literature  on  this  problem,  see 
[2],  In  [1],  Vaishampayan  derived  an  iterative  algorithm  for 
the  design  of  multiple  description  scalar  quantizers  (MDSQs), 
which  is  closely  related  to  Lloyd’s  algorithm  for  quantizer  de¬ 
sign.  While  the  algorithm  monotonically  decreases  the  average 
distortion  cost,  it  is  likely  to  be  trapped  in  poor  local  minima, 
unless  “good”  initial  code  books  and  initial  index  assignment 
are  used.  Vaishampayan  recognized  this  shortcoming,  and 
proposed  heuristic  initializations  for  the  special  case  where 
the  two  channels  have  identical  capacities  and  failure  proba¬ 
bilities  (i.e.,  Ai  =  A2  and  Ri  =  R2)  [1].  But  these  do  not 
generalize  well  to  vectors,  or  to  asymmetric  channels. 

We  propose  a  Deterministic  Annealing  (DA)  approach  to 
the  design  of  unstructured  MDVQs  for  two-channel  diversity 
systems,  where  the  channels  may  have  possibly  differing  ca¬ 
pacities  and  failure  probabilities.  Our  approach  is  indepen¬ 
dent  of  initialization,  does  not  assume  any  prior  knowledge 
of  the  source  density  and  avoids  many  poor  local  minima  of 
the  cost  surface.  It  consists  of  iterative  optimization  of  a  ran¬ 
dom  encoder  at  gradually  decreasing  levels  of  randomness,  as 
measured  by  the  Shannon  entropy.  At  the  limit  of  zero  en¬ 
tropy,  a  hard  multiple  description  quantizer  is  obtained.  Our 
approach  is  inspired  by,  and  builds  on,  the  DA  approach  for 
vector  quantizer  design  [3]. 

Let  us  begin  by  assuming  that  the  three  code-books,  X°  = 
{x°j},  X1  =  {x1}  and  X2  =  {x2}  are  given.  We  use  a  random 
encoding  rule,  and  assign  input  source  vector  x  to  the  index 
pair  ( i,j )  with  probability  q(ij\x).  These  encoding  probabil¬ 
ities  are  chosen  to  minimize  D  subject  to  a  specified  level  of 
randomness,  measured  by  the  Shannon  entropy.  Correspond¬ 
ingly,  we  minimize  the  Lagrangian  F  —  D—TH.  Here  H  is  the 
entropy,  and  the  Lagrangian  multiplier,  T,  is  called  the  “tem¬ 
perature”  of  the  system  in  reference  to  the  statistical  physics 
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analogy.  Minimizing  F  with  respect  to  q(ij\x)  gives 

,  exp[— (5: Mil®  —  ®?jl|2  +  Ai||x  —  x|||2  +  A2||x  —  i2||2}] 
q{ij\x)  = - - - 

where  the  normalizing  factor  Zx  ensures  that  .  q(ij |x)  =  1. 
Further,  the  corresponding  minimum  of  F  is  easily  seen  to  be 
F*  =  min^yia-)  F  =  —  T^2,xp{x)  log  Zx.  We  now  find  the 
optimal  sets  of  reproduction  vectors  which  minimize  F*  for 
this  random  encoder: 

=  £*P(XI*)*>  x2j  ='^2p(x\j)x. 

X  X 

Our  algorithm  consists  of  minimizing  F*  with  respect  to  the 
code  vectors  starting  at  a  high  temperature  and  tracking  the 
minimum  while  decreasing  the  temperature. 

Scalar  (for  asymmetric  channels)  and  vector  quantizers 
designed  by  DA  provided  substantial  gains  over  those  de¬ 
signed  by  the  iterative  algorithm  of  [1]  even  for  small  code¬ 
book  sizes.  For  a  2-d  Gauss-Markov  source  with  p  =  0.9, 
the  average  distortion  cost  of  the  DA-designed  MDVQ  (with 
Ri  =  R2  =  1.5bpss,  Ai  =  A2  =  0.01)  was  0.7dB  lower  than 
the  best  of  twenty  different  MDVQs  designed  by  the  Lloyd  ap¬ 
proach  with  random  initializations.  Note  that  the  initializa¬ 
tions  suggested  in  [1]  do  not  extend  to  vectors.  These  initial¬ 
izations  can  be  used  in  the  design  of  scalar  quantizers.  While 
the  heuristic  initializations  are  better  than  random  initializa¬ 
tions,  the  DA-designed  quantizers  outperformed  both.  For 
scalar  quantizers  of  a  Gaussian  source,  the  average  distortion 
cost  for  DA-designed  MDSQs  of  Ri  =  R2  =  3bpss,  Ai  —  0.006, 

A2  =  0.012  and  Ri  =  3bpss,  R2  =  2bpss,  Ai  =  A2  =  0.01  were 
respectively  0.5dB  and  l.OdB  lower  than  the  best  of  the  MD¬ 
SQs  designed  by  the  algorithm  of  [1],  with  both  random  and 
heuristic  initializations. 

In  [2],  El  Gamal  and  Cover  are  credited  with  this  weak 
characterization  of  a  multiple  description  achievable  region: 
rate-distortion  quintuples  (Ri,  R2,  D0,  Di,  D2)  are  achievable 
if  there  exist  random  variables  7i  and  I2  jointly  distributed 
with  the  source  X  such  that  Rm  >  7(A;7m),  m  =  1,2, 
and  Ri  +  R2  >  7(A;7i,72)  +  7(7i;72)  and  side  and  central 
reproductions  of  the  forms  Xm  =  gm(Im),  m  =  1,2,  and 
A0  =  po(7i,72)  such  that  E{d(X,  A4)}  <  Dt,  t  —  0,1,2. 
The  DA  algorithm  for  MDVQ  design  can  be  shown  to  imitate 
parametric  determination  of  the  convex  hull  of  this  achievable 
region. 
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Abstract  —  We  propose  encoding  and  decoding 
methods  based  on  linear  codes,  to  achieve  all  the  in¬ 
tegral  points  in  the  rate  region  of  Slepian-Wolf  [1] 
problem.  The  extension  of  these  concepts  to  the  con¬ 
struction  of  Euclidean-space  codes  is  also  studied  and 
analyzed  for  the  case  of  trellis  codes. 

I.  Introduction 

Distributed  source  coding  deals  with  the  efficient  encoding 
of  correlated  sources  that  do  not  communicate  with  one  an¬ 
other.  This  was  first  introduced  in  [1]  where  it  was  shown  that 
two  correlated  memoryless  sources,  X  and  Y  can  be  separately 
compressed  at  a  total  rate  approaching  the  j'oint  entropy.  In 
this  paper  we  focus  on  the  sensor  network  system  considered 
by  Flynn  and  Gray  [2]  as  shown  in  Fig.  1.  Here  we  have  a 
source  X  which  is  observed  in  a  corrupted  form  by  a  number  of 
sensors.  Let  Yi  denote  the  signals  captured  by  the  Ith  sensor. 
Each  sensor  encodes  its  message  into  bits  to  be  transmitted  to 
a  receiver  to  get  the  optimal  reconstruction  of  the  signal  X . 
Here  we  consider  symmetric  encoding  of  correlated  sources  in 
a  bandwidth-restricted  system. 

II.  Symmetric  Encoding  of  Binary  Sources 

We  consider  an  example  for  illustration  of  the  basic  con¬ 
cepts.  Consider  a  pair2  of  correlated  discrete  memoryless 
sources  X  and  Y  such  that  X,  Y  £  {0, 1}"  and  dii(X,  Y)  <  t 
where  cf/r(.,.)  is  the  Hamming  distance.  According  to  [1],  X 
and  y  can  be  separately  compressed  at  rate  pairs  Ri ,  R3  given 

by 

R\ ,  R2  ^  n  —  k ,  R\  -f-  R2  >  2n  —  k ,  (1) 

where  k  meets  the  sphere  packing  bound.  Let  us  consider  a 
system  based  on  (n,  k ,  2t  +  1)  linear  binary  code,  C,  to  achieve 
these  points  on  the  rate  region. 

Theorem:  C  achieves  all  the  integral  points  on  (1) 

Proof:  Let  G  be  the  generator  matrix  of  C.  Let  G1  and  G2 
be  n  —  Ri  xn  and  n  —  R2  x  n  matrices  respectively,  obtained 
by  a  partition  of  k  rows  of  G.  Let  C\  and  C2  be  the  linear 
codes  associated  with  Gl  and  G2  respectively.  The  encoders 
associated  with  X  and  Y  send  the  index  of  the  coset  of  these 
subcodes  containing  their  outcome.  Decoding  involves  finding 
a  pair  of  codewords  from  the  given  cosets  of  Ci  and  Ci  which 
are  closest  in  distance.  Q.E.D. 

III.  Lossy  encoding  using  trellis  codes: 

Let  the  random  processes  Yi ,  Y2  received  by  the  sensors  are 
given  by  Yi  =  X  +  Ni  for  i  =  1, 2,  where  N,  are  independent  of 
X.  We  need  to  encode  Yi  and  Y2  separately  to  be  transmitted 

1This  work  was  supported  in  part  by  DARPA  Grant  F29601-99- 
1-0169  and  NSF  (CAREER)  Grant  MIP  97-03181. 

Extension  to  more  than  2  sources  is  straightforward 


to  the  receiver  to  get  an  optimal  reconstruction  of  X .  First 
we  quantize  them  separately  using  the  quantizers  designed  for 
their  marginal  distributions.  We  then  exploit  the  correlation 
between  Y\  and  Y2  (using  algebraic  codes)  to  reduce  the  rate 
of  transmission.  We  encode  the  observations  in  blocks  while 
minimizing  the  mean  squared  error. 

Let  us  consider  a  scalar  quantizer  with  8  levels.  Suppose 
R\  =  R2  —  2  bits/source  sample.  Let  V  be  the  set  of  recon¬ 
struction  levels  of  the  scalar  quantizer.  We  partition  Vn  into 
22"  cosets  each  containing  2n  code  vectors.  The  encoder-1  and 
encoder-2  (of  Yi  and  >2  respectively)  partition  V"  in  two  dif¬ 
ferent  ways.  Consider  the  4-state  Ungerboeck  trellis  built  on 
V  with  a  2/3  convolutional  encoder  with  the  generator  matrix 
polynomial  G(t).  We  form  Gf(t)  and  G2(t)  by  partitioning 
the  rows  of  G(t).  Let  C\  and  Ci  be  the  codes  associated  with 
Gl(t)  and  G2(t)  respectively.  Each  of  the  encoders,  sends  the 
index  of  the  coset  of  these  subcodes  containing  the  observed 
quantized  output,  thus  spending  2  bits/source  sample.  We 
obtain  the  most  probable  sequence-pair  using  the  Viterbi  al¬ 
gorithm  in  the  tensor  product  trellis.  It  can  be  shown  that  the 
decoding  complexity  is  the  same  as  that  of  decoding  a  code¬ 
word  in  the  underlying  trellis  code. 

Distance  Properties:  Let  us  denote  the  ith  coset  of  C3 
as  Cj(i)  for  »  €  {1, 2, . . . ,  22n}  and  j  —  1,2.  For  any  coset 
pairs  (»,/),  define:  a(i,j)=  minimum  of  distances  d„(cl,c2) 
between  any  two  codewords  (cl,c2)  €  (Ci(»),  C2U))  such 
that  3  at  least  one  pair  (c3,  c4)  ^  (cl,c2)  with  de(cl,c2)  > 
de(c3, c4),  (c3,  c4)  E  C\ (i),  C2{j))-  We  define  the  correlation 
distance,  dc  as  follows: 


dc  = 


minimum 

Lie  {1,2,..., 22"} 


Theorem:  For  all  the  trellis  codes  dc  >  . 


(2) 


Figure  1:  Sensor  network  communication  system:  Encoders 
observe  corrupted  version  of  the  source  X,  and  transmit  their 
information  to  the  decoder  to  get  the  best  reconstruction  of 
X .  The  encoders  do  not  communicate  with  each  other. 
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Abstract  —  It  is  known  that  liminfn_>0O((2n  +  l)/2  - 
©(C2n+i))  =  0  and  that  the  limsup  of  this  difference  is 
at  most  1/4,  where  0(G)  is  the  Shannon  capacity  of 
the  graph  G.  In  this  paper  we  prove  that  the  above 
limsup  is  at  most  1/6  and  conjecture  that  the  limit 
itself  exists  and  equals  0.  We  show  that  the  limsup 
is  small  by  constructing  large  independent  sets  in  the 
third  power  of  C2n+i- 

I.  Shannon  Capacity 

The  zero  error  capacity  of  a  discrete  noisy  channel  C  was 
invented  by  Shannon  [5].  A  channel  consists  of  a  finite  set 
X  of  possible  input  letters  and  for  each  x  €  X  a.  subset  Yx 
of  a  (not  necessarily  finite)  output  set  Y .  Here  Yx  is  the  set 
of  possible  outputs  of  the  channel  on  input  x.  Clearly,  if  the 
decoder  receives  an  output  y  S  YX1  fl  YX2  where  ii  /  12  €  I 
then  the  decoder  cannot  be  certain  of  the  input  letter,  i.e.,  it 
will  make  an  error  in  decoding  with  a  certain  probability.  On 
the  other  hand,  if  for  x  ^  x'  €  X  we  have  Yx  fl  Yxt  =  0,  the 
decoder  will  be  able  to  determine  the  exact  input  letter:  it  is 
the  unique  x  E  X  for  which  the  output  y  is  contained  in  Yx . 

In  order  to  determine  the  maximum  number  of  letters 
that  can  be  transmitted  through  the  channel  without  the 
possibility  of  an  error,  Shannon  associated  a  (characteristic) 
graph  G  =  G(G)  to  the  channel  C  as  follows.  The  vertices 
V(G)  of  the  graph  G  are  labeled  by  the  possible  input  let¬ 
ters  (| V (G) |  =  1X1),  and  two  vertices  X\ ,  2: 2  are  adjacent  iff 
Y.x  nyX2  ^  0.  Clearly,  the  labels  of  an  independent  set  can  be 
transmitted  without  an  error.  Therefore,  the  number  of  let¬ 
ters  that  can  be  transmitted  by  C  without  an  error  is  exactly 
the  independence  number  a(G). 

If  the  sender  transmitted  k  letters,  say,  xi, . . .  ,Xk,  i.e.,  the 
channel  has  been  used  k  times,  then  the  output  of  the  channel 
will  also  contain  k  symbols  j/i, . . . ,  yk,  Vi  €  YXi.  This  situation 
can  be  considered  as  a  single  use  of  the  channel  Ck,  which  has 
input  set  Xk,  output  set  Yk  and  the  set  of  possible  outputs 
YXI  x  •  •  •  x  YXk  on  input  x\,...,Xk.  The  kth  power  Gk  of  a 
graph  G  is  defined  as  follows.  The  vertex  set  of  Gk  is  V (Gk)  = 
V(G)k,  and  two  vertices  (xi,  *2, . . .  ,x*)  and  (x'lt  x'2, . . . ,  x'k) 
are  adjacent  iff  for  all  1  <  i  <  k  either  x%  —  x\  or  x,  and  x'  are 
adjacent  in  G.  It  is  easy  to  see  that  the  number  of  sequences 
of  length  k  that  can  be  transmitted  without  an  error  is  the 
independence  number  of  the  kth  power  of  G(G). 

The  Shannon  capacity  of  C  is  defined  as 

0(G)  =  sup  (a(G(C)'t))1/fc  =  lim  ( a(G{C)k))1/k  . 

Note  that  the  capacity  gives  a  measure  of  the  optimal  perfo- 
mance  of  the  channel  when  transmitting  long  sequences.  This 

1  The  second  author  was  partially  supported  by  OTKA  Grants 
T  030059  and  T  29074,  FKFP  0607/1999. 

2The  third  author  was  partially  supported  by  NSF  grant  DMS- 
9970622. 


limit,  by  super-multiplicativity  exists  and  -  since  ( a(G))k  < 
a(Gk)  for  an  arbitrary  graph  G  -  it  is  always  at  least  a(G). 
It  is  worth  of  mentioning,  that  Shannon  originally  [5]  defined 
the  capacity  as  log  ©  (we  use  the  definition  and  notation  of 
Lovasz  [4]).  Also  notice,  that  0  depends  on  the  graph  G(C) 
only,  and  every  graph  is  the  characteristic  graph  of  some  chan¬ 
nel.  Therefore,  we  consider  the  Shannon  capacity  of  graphs: 

0(G)  =  sup  (a(Gk))  !  —  lim  (a(Gk))  ^  . 

fc  '  '  7  k-t-oo  x  ' 

Since  Shannon’s  invention  of  the  capacity  [5]  in  1956,  it  has 
been  one  of  the  central  topics  in  both  information  theory  and 
extremal  graph  theory.  For  a  more  detailed  overview  of  this 
fascinating  topic  we  refer  the  reader  to  the  excellent  surveys 
of  Alon  [1]  and  Gargano,  Korner,  Vaccaro  [2]. 

The  aim  of  this  paper  is  to  investigate  the  Shannon  capacity 
of  large  odd  cycles  C2n+i-  It  follows  from  a  result  of  Hales  [3] 
that  for  an  infinite  subsequence  Uk ,  k  =  1,2,...,  of  positive 
integers  the  difference  nk  +  1/2  —  0(C2nJt+i)  tends  to  zero  as 
k  tends  to  infinity  (note  a(C2n+i)  =  ti),  i.e., 

Theorem  1.1  (Hales) 

lim  inf  (ri  +  1/2  —  ©(C271+1))  =  0. 

n— >00 

Hales  also  showed  that  the  lim  sup  of  this  difference  is  at  most 
1/4.  Modifying  Hales  linear  algebraic  construction,  we  show 
the  difference  cannot  be  larger  then  1/6  as  n  tends  to  infinity 

Theorem  1.2 

limsup(n  +  1/2  —  0(C2n+i))  <  1/6. 

n— >00 

We  strongly  believe  that  the  limit  as  n  tends  to  infinity  of  the 
difference  n  +  1/2  —  0(C2n+i)  exists  and  is  equal  to  0. 
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Abstract  —  Two-dimensional  run  length  limited  codes 
satisfying  the  square  constraint  are  considered.  Let  5  de¬ 
note  a  square  of  area  A(S)  and  let  an  be  a  positive  sequence 
satisfying  limn-+00  an  =  oo.  It  is  shown  that  the  capacity  Cn 
corresponding  to  the  set  Sn  =  a„SnZ2  asymptotically  sat¬ 
isfies 


lim  Cn 

n— f  oo 


log2  a2  A(S) 


I.  Introduction 

One-dimensional  run  length  constraints  are  important  in 
magnetic  recording  applications  and  two-dimensional  run 
length  constraints  have  recently  gained  interest  due  to  optical 
recording  applications  [1,  2,  3].  A  two-dimensional  run  length 
constraint  requires  that  a  binary  labeling  of  the  integer  lattice 
Z2  have  a  specified  minimum  and  maximum  number  of  ze¬ 
ros  between  consecutive  ones  both  horizontally  and  vertically. 
Additional  constraints,  such  as  run  length  constraints  along 
diagonals  can  also  be  imposed  in  order  to  more  accurately 
model  optical  recording  devices.  In  this  paper  we  examine  the 
asymptotic  behavior  of  the  “square”  constraint.  The  square 
constraint  imposes  the  condition  that  for  every  “one”  stored 
in  the  plane,  it  must  be  surrounded  by  a  square  of  zeros  of 
some  given  side  length.  As  the  side  length  of  the  square  grows 
to  infinity  the  amount  of  information  that  can  be  stored  per 
unit  area  shrinks  to  zero.  In  other  words  the  capacity  of  the 
constraint  falls  to  zero.  In  this  paper  we  determined  the  exact 
rate  that  the  capacity  of  the  square  constraint  falls  to  zero  as 
a  function  of  the  area  of  the  constraint. 

II.  Definitions  and  Results 

Let  R2  denote  the  two-dimensional  plane,  and  Z2  the  two- 
dimensional  integer  lattice  (i.e.  Z2  =  {(zi,X2)  :  xi,X2  €  Z}). 

Suppose  that  V  C  Z2,  such  that  (0,0)  6  V.  The  code 
/  :  Z2  — t  {0, 1}  satisfies  the  constraint  V  (or,  /  defines  a  valid 
labeling  of  Z2  with  respect  to  V),  if  for  every  x  6  Z2 

/(x)  -  1  =►  /( y)  =  0  for  Vy  e  V  +  x,  y  ±  x  .  (1) 

A  subset  of  Z2  of  the  form  =  {(x,y)  g  Z2  :  a  < 

x  <  c,b  <  y  <  d}  for  some  integers  a,b,c,d,  will  be  called 
a  rectangle.  A  binary  labeling  of  the  rectangle  7?.j^jis  valid 
with  respect  to  a  given  constraint  V,  if  the  labeling  can  be 
extended  to  a  labeling  of  Z2  satisfying  the  constraint  V.  Let 
iVy  (m,  n)  denote  the  number  of  valid  labelings  of  the  rectangle 
with  respect  to  V.  The  capacity  Cv  corresponding  to 
a  set  V  C  Z2  including  the  origin  is  defined  as 

Cv  =  lim  log2JVv(m-l,n-l) 

m,n— f  oo  77271 

'This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation. 


The  proof  in  [1]  can  be  generalized  to  show  that  the  above 
limit  exists. 


III.  The  asymptotic  capacity  of  the  square 

CONSTRAINT 

In  this  section  ScR2  will  denote  a  square  centered  at  the 
origin,  whose  sides  are  parallel  to  the  coordinate  axes.  Let  <S  = 
SDZ2,  and  let  an  be  a  sequence  of  positive  real  numbers,  such 
that  limn-Kx,  q„  =  oo.  Consider  the  sequence  of  capacities  Cn 
corresponding  to  the  constraints  5„  —  a„SnZ2,  as  n  -t  oo.  In 
the  main  theorem  of  this  section  we  determine  the  asymptotic 
rate  that  C„  goes  to  zero  as  n  ->  oo. 

Lemma  1  Let  Cn  denote  the  capacity  corresponding  to  the 
constraint  S„.  Write  S„  =  for  some  integer  d,  and 

consider  the  set  Sn  =  T or  every  positive  integer  n,  Cn 

satisfies  the  inequality 

C  <  log2(-4(^n)  +  1) 

"  “  A(Sn) 

where  A(5n)  denotes  the  number  of  lattice  points  in  Sn  ■ 

Lemma  2  Let  Cn  denote  the  capacity  corresponding  to  the 
constraint  Sn ■  Write  S„  =  for  some  integer  d,  and 

consider  the  set  <S„  =  Tor  Ve  >  0,  Vy  €  Z+  there  exists 

N,  such  that  for  Vn  >  N, 


'--(tyt)' 


log2  A(5n)) 

A{S„) 


(2) 


Theorem  1  Let  Cn  denote  the  capacity  corresponding  to  the 
constraint  Sn  —  otnS  fl  Z2.  Then, 


lim  Cn  ■  =  -f- 

n— too  log2a„  A(S) 
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Abstract  —  Retro-information  is  possible  in  non¬ 
unitary  universe.  We  give  an  estimate  of  the  capacity 
of  retro-information  channels  in  parallel.  The  result  is 
significantly  different  from  classical  channel  capacity. 

I.  Introduction 

The  word  retro-information  denotes  the  hypothetical  pos¬ 
sibility  to  transfer  information  backward  in  time.  How  retro- 
information  could  be  made  possible  is  discussed  in  [2,  3]  and  is 
the  extension  of  quantum  information  theory  [1]  over  unitarity 
singularities.  In  this  case  the  measure  operators  are  not  uni¬ 
tary.  In  short,  physical  evidence  of  non-unitarity  could  come 
from  the  following  observations: 

1.  The  unification  of  Quantum  Theory  with  General  Rel¬ 
ativity  poses  problems  to  physicists  and  cosmologists. 

2.  A  non-unifiable  universe  necessarily  carries  symmetry 
violations  which  imply  unitarity  exceptions. 

In  this  paper  we  assume  that  retro-information  is  possible  and 
we  want  to  establish  quantitative  results  on  the  capacity  of 
several  retro-information  channels  in  parallel  when  they  are 
submitted  to  a  forward  coupling :  i.e.  when  the  result  of  the 
transmission  is  made  available  to  the  transmitter  via  a  reli¬ 
able  forward  channel  before  the  transmission  occurs.  To  our 
knowledge  this  kind  of  configuration  in  communication  theory 
is  completely  new  and  innovative. 

II.  Computation  of  channel  capacities 

We  consider  a  V  x  M  retro-information  channel  and  denote 
by  if  the  wave  function  associated  to  this  channel.  When  i 
denotes  a  V-ary  output  symbol  and  j  a  M- ary  setting  sym¬ 
bol,  we  denote  A’  the  subset  of  quantum  measurements  A\ 
which  provide  output  symbol  j  under  setting  i.  We  model 
the  channel  via  its  transfer  operator  T,  i.e  the  V  x  M  matrix 
whose  (i,  j)  coefficient  is  p(A))  =  f  {  \ip\2. 

J 

In  unitary  universes,  for  all  j:  the  p(AJ)’s  sum  to  one,  and 
we  have  a  classical  information  transfer  probability  operator. 
In  the  following  we  do  not  assume  that  the  matrix  T  is  unitary, 
i.e.  (1, . . . ,  1)  may  not  be  a  left  eigenvector. 

We  now  consider  n  i.i.d  retro-information  channels  in  paral¬ 
lel.  The  transfer  operator  associated  to  the  n  channels  is  T®". 
Let  Zn  =  zi ...  z„  be  the  M- ary  codeword  of  the  setting  sym¬ 
bols  of  the  channels  and  Yn  =  y\  . . .  y„  the  V-ary  codeword  of 
their  output  symbols.  Denoting  A(Yn,  Z„)  =  Avz\  x  •  ■  x  A\™ 
we  have  p(A(Yn,Zn))  =  p{Ayz\)  x  •••  xp(Af"). 

Let  Xn  =  x\  . . .  xn  be  a  V-ary  codeword  to  be  send  via  the 
channels.  In  absence  of  forward  coupling,  the  setting  Zn  is 
only  a  function  of  Xn.  Zn  =  Zn(Xn)  and 


In  presence  of  Forward  Coupling  Function  (FCF),  Zn  is  a 
function  of  both  Xn  and  Yn:  Zn  =  Z„(Xn,Yn)  and 


P(Yn\Xn)  = 


p{A(Yn,Zn{Xn))) 

j:Y,p{A(Yi,Zn(Xn))) 


PF(Vn|X„)  = 


p(A(Yn,Zn(Xn,Yn))) 
'Ey  p(A(Y’,Zn(Xn,Yi))) 


Theorem  1  There  exists  a  set  of  FCF  which  rises  the  capac¬ 
ity  of  the  n  V-ary  channels  in  parallel  to  nCF  where 


C  =mm{l,log 


,E,-ma Xjp(A})1 
1  Ei  minJ  p(Aj) 1 


Remark:  This  new  capacity  is  much  greater  than  the  clas¬ 
sical  capacity  nC  without  FCF.  Figure  1  shows  both  C  and  CF 

P  1  —  P 

in  the  binary  symetric  case  where  T  =  ^  ^  with 

p>  i.  In  this  case  we  have  C  =  l+plog2p-|-(l—p)log2(l—p), 
while  CF  =  min{l,  log2  y§-}. 


Figure  if  Retro-channel  Capacities  (f  and  (fF 


versus  p. 


III.  Conclusion 

We  have  presented  a  new  information  transfer  configuration 
in  information  theory  based  on  extrapolated  physical  assump¬ 
tions.  Retro-information  a  priori  is  a  logical  challenge  (even 
when  restricted  to  short  space- time  lap),  however  it  can  be 
framed  in  a  consistent  axiomatic  which  can  take  into  account 
paradoxical  effects  due  to  forward  . coupling.  The  unexpected 
results  about  channel  capacities  contributes  to  make  retro- 
information  a  very  promising  area  of  investigation. 
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Abstract  —  The  determination  of  the  capacity  of  a 
binary  Finite  State  Channel  with  memory  is  in  general 
a  very  difficult  task.  In  this  paper  we  present  a  new 
systematic  method  which  amounts  to  computing  the 
entropy  of  the  channel  error  sequence  represented  as 
the  output  of  a  stochastic  finite  state  automaton  with 
state  cardinality  at  most  twice  the  one  of  the  original 
channel.  Each  state  of  the  original  channel  yields  a 
maximum  of  two  states  in  the  automaton  state  tran¬ 
sition  diagram  according  to  whether  the  preceeding 
error  symbol  was  a  one  or  a  zero.  The  error  class 
E  is  defined  as  the  class  of  all  states  terminating  on 
an  error  while  the  remaining  class  E  contains  states 
with  transitions  corresponding  to  no  error.  Conse¬ 
quently  any  path  along  states  in  E  represents  a  solid 
burst  of  errors  and  reciprocally  all  the  solid  bursts  of 
errors  can  only  result  from  transitions  between  states 
in  E  and  the  same  property  applies  to  errorless  events 
which  can  only  result  from  transitions  between  states 
in  E.  If  the  channel  has  K  states,  the  final  result  is 
obtained  by  computing  at  most  2 K  series  whose  ele¬ 
ments  are  the  coefficients  of  the  generating  functions 
of  the  runs  of  0's  after  an  error  terminating  in  any  of 
the  states  of  E  and  of  the  runs  of  l's  after  an  errorless 
event  terminating  in  any  of  the  states  of  E. 

I.  Summary 

In  this  paper  we  address  the  problem  of  computing  the  capac¬ 
ity  of  a  class  of  finite  state  binary  transmisssion  channels.  Our 
basic  model  considers  a  finite  state  binary  channel  with  inputs 
{i„}  and  outputs  {?/„}  taking  values  on  {0,1}  and  such  that 

yn  =  x„  ®  zn  -  (1) 

where  zn  is  the  error  with  values  on  {0, 1}  assumed  to  be  in¬ 
dependent  of  the  input.  The  generation  of  the  error  process 
{zn}  depends  on  the  current  state  sn  €  {0, —  1}  ac¬ 
cording  to  the  law  Pr{  zn  —  1|  s„  =  k}  =  1  —  pk  and  the  state 
process  is  Markov  according  to  a  given  transition  probabilities 
matrix  Q  =  [Pr{  sn  —  j\  sn- i  =  *}]  i,j  €  {0,1,...,  K  -  1}. 
Such  a  channel  model  might  be  adequate  to  represent  a  fading 
channel  for  which  the  error  rate  increases  as  the  transmitted 
signal  fades  out.  It  constitutes  a  generalization  of  the  classical 
Gilbert-Elliott  channel  which  has  K  =  2  states.  Results  due 
to  Gilbert  [1]  are  known  for  this  channel  in  the  case  where 
p0  =  l,p!  1  and  more  generally  bv  using  a  different  ap¬ 
proach  with  non  zero  probability  of  error  in  both  states  [2]. 

'This  work  was  supported  by  NSERC  Grant  OGP0001701. 


In  contrast,  our  proposed  method  of  computation  of  the  ca¬ 
pacity  is  valid  for  any  K  >  2  and  extends  in  a  systematic 
way  the  original  approach  of  Gilbert  based  on  the  analysis  of 
bursts  of  consecutive  zeros  occuring  after  an  error.  It  turns 
out  however  that  in  general  statistics  relating  to  the  bursts  of 
ones  are  also  required.  If  the  channel  has  K  states,  the  final 
result  is  obtained  by  computing  at  most  2 K  series  whose  el¬ 
ements  are  the  coefficients  of  the  generating  functions  of  the 
runs  of  0's  after  an  error  terminating  in  any  of  the  states  of 
E  and  of  the  runs  of  l's  after  an  errorless  event  terminating 
in  any  of  the  states  of  E.  Similarly  to  the  case  of  the  well 
known  Gilbert  channel,  an  alternate  more  elegant  sum  of  se¬ 
ries  which  in  general  converge  slower  can  be  used  based  on 
the  coefficients  of  other  generating  functions  representing  the 
probabilities  of  the  runs  of  consecutive  errors  between  error¬ 
less  events  starting  in  any  state  of  E  and  the  error  free  runs 
between  errors  starting  from  any  state  in  E. 
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Abstract  —  We  introduce  a  new  approach  for  the 
study  of  weight  distributions  of  cosets  of  the  Reed- 
Muller  code  of  order  1.  We  next  examine  the  im¬ 
pact  of  our  results  when  some  cryptographic  criteria 
of  Boolean  functions  are  considered. 

Our  main  purpose  is  to  study  the  nonlinearity,  and  other 
cryptographic  criteria  of  Boolean  functions  throughout  the 
properties  of  weight  distributions  of  cosets  of  the  Reed-Muller 
code  of  order  1  -  denoted  by  R(l,  m).  Indeed,  it  appears  in  re¬ 
cent  papers  that  the  knowledge  of  the  whole  Fourier-spectrum 
of  a  given  function,  and  not  only  its  maximal  value,  is  of  great 
interest  from  a  theoretical  point  of  view  as  well  as  for  applica¬ 
tions  [1,  4],  We  begin  by  giving  a  general  result,  based  on  the 
method  introduced  by  Kasami  in  [5]  by  using  Pless  identities. 

Theorem  1  Let  m  be  a  positive  integer,  m  >  3.  Consider 
any  binary  linear  code  C  of  length  n  ±  2m,  dimension  k  = 
m  - 1-2  and  minimum  distance  8.  Let  us  denote  by  aw  (resp. 
bw)  the  number  of  codewords  of  weight  w  in  C  (resp.  C L)  and 
by  1(A)  the  number 

1(A)  =  (w  —  2m~1)2  ((w  —  2m_1)2  —  A2)  aw. 

U>  =  1 

Assume  that  C  contains  the  all- one  vector  1  and  that  Cx  is 
such  that  b\  —  fe2  =  63  =  0.  Then  for  any  positive  integer 
A  <  2m_1 ,  we  have 

J(A)  =  2m  (364  -  2m~2  ((2m_1  -  l)2  +  (A2  -  2m“1))) 

If  8  >  2m_1  —  A  then  1( A)  <  0  which  can  be  expressed  as 

64  <  |  2m-2((2m-1-l)2-2m~1  +  A2).  (1) 

Equality  holds  in  (1)  if  and  only  if  8  =  2m~1  —  A  and  if  the 
weight  distribution  of  C  is:  ao  =  a 2”>  —  1  and 


w 

8 

2m~1 

2m  —  8 

dyj 

rj771-}-2  2irn  ^  O 

2  'Am  —  2 

(£_2"»-l)2 

1  (5_2m-l)2  ^ 

(5_2’»-l)2 

Since  64  A  0,  the  minimum  distance  of  Cx  is  exactly  4. 

The  codes  (x  +  R(l,m))  U  R(l,m),  where  x  £  f?(l,m), 
satisfy  the  hypothesis  of  Theorem  1.  A  coset  x  +  R(l,m)  is 
said  to  be  almost  optimal  if  its  minimum  weight  is  greater 
than  or  equal  to  wo,  where  w 0  =  2m_1  —  2^m^)^2  for  odd  m, 
and  wo  =  2m~1  —  2m^2  for  even  m.  It  is  called  three-valued 
almost  optimal  if  it  has  three  weights  only,  2m_1  and  ±u'o  - 
its  weight  distribution  is  the  one  given  in  Theorem  1. 

Corollary  1  If  x  +  R(l,m)  is  almost  optimal,  then 

•  if  m  is  odd,  then  64  <  5  2m_2  (2m_1  —  l)2; 

•  if  m  is  even,  then  64  <  |  (2m_2  (2m~1  —  l)2  -f  22m_3) . 

In  both  cases,  equality  holds  if  and  only  if  x  +  R(l,m)  is 
three-valued  almost  optimal. 


Let  /  be  any  Boolean  function  with  m  variables.  The 
Fourier  transform  of  /  is 

HI)  =  Y,  ("1)/(’)  =  2m  -  2wt^f)  ■ 

sSF™ 

The  set  {±T(f +ipa)  |  a  €  FJ1}  is  called  the  Fourier-spectrum 
of  /,  where  <pa  denotes  any  linear  function. 

The  nonlinearity  of  /  is  equal  to  2m~1  —  (£(/)/ 2),  where 

£(/)  =  max  |  T(f  +  ipa)  |. 

o£F™ 

Denote  by  fl/  the  codeword  composed  of  the  values  f(x),  x  £ 
F™.  We  will  say  that  /  is  almost  optimal  (or  three- valued 
almost  optimal)  when  the  coset  f If  +  R(l,m)  satisfies  this 
property.  Let  Daf  be  the  derivative  of  /  with  direction  a: 
Daf(x)  =  f(x)  +  f(a  +  x).  The  main  indicator  related  to  the 
global  avalanche  criterion  is 

v(/)  =  y 

We  will  examine  the  connections  between  the  nonlinearity  and 
the  global  avalanche  criterion.  We  first  show  that  if  /  is  almost 
optimal  then  V(/)  <  22m+1  for  odd  m  and  V(f)  <  22m+z  for 
even  m  -  with  equality  if  and  only  if  /  is  three- valued  almost 
optimal. 

We  next  study  the  restrictions  of  a  Boolean  function  f  to 
each  coset  of  any  linear  subspace  of  F™.  We  notably  establish 
a  relation  between  the  Fourier  spectrum  of  /  and  the  Fourier 
spectra  of  its  restrictions.  This  leads  us  to  obtain  some  char¬ 
acterizations  of  bent  functions,  of  three- valued  almost  optimal 
functions  and  of  almost  optimal  functions  which  have  a  linear 
structure.  We  give  a  full  explanation  of  links  between  bent 
functions  and  three- valued  almost  optimal  functions. 
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Abstract  —  We  propose  a  construction  of  resilient 
functions  with  n  binary  input  variables  and  m  binary 
output  variables.  In  certain  cases,  the  nonlinearity  of 
these  functions  is  the  highest  nonlinearity  known. 

I.  Introduction 

An  n-input  m-output  function,  F(x i, . . .  ,xn)  =  (/ 1, . . . ,  fm), 
is  a  set  of  m  Boolean  functions,  /i , . . . ,  /m ,  where  each  /;  : 

i-4  F2 .  One  of  the  many  applications  of  these  functions 
may  be  the  realization  of  S-boxes  in  DES-like  block  ciphers. 
Different  properties  and  criteria  for  such  functions  have  been 
studied,  see  e.g.  [3].  Here,  we  consider  two  criteria,  namely 
nonlinearity  and  resiliency. 

The  previous  work  on  nonlinear  resilient  functions  is  mostly 
based  on  the  two  constructions  presented  in  [2]  and  [4].  As 
proved  in  [2],  there  is  a  tradeoff  between  the  nonlinearity  and 
resiliency  when  the  two  constructions  axe  compared.  The  con¬ 
struction  in  [2]  gives  higher  nonlinearity,  while  in  [4]  a  larger 
resiliency  could  be  obtained  for  the  same  n  and  m. 

II.  New  Construction 

The  construction  presented  here  is  an  extension  of  the  de¬ 
sign  of  nonlinear  Boolean  functions  presented  in  [1].  It  yields 
highly  nonlinear  resilient  functions  for  any  given  input  triple 
(n,m,t),  where  t  is  the  order  of  resiliency.  A  well-known  re¬ 
sult  states  that  a  function  F(x i,...,xn)  =  (/i,--.,/m)  is 
an  (n,m,  t)-resilient  function  if  and  only  if  all  nonzero  lin¬ 
ear  combinations  of  f\ , . . . ,  fm  are  (n,  1,  t)-resilient  functions. 
Furthermore,  the  nonlinearity  of  F(x)  =  (fi(x), . . . ,  fm(x)), 
denoted  by  Nf,  is  defined  as  the  minimum  nonlinearity  of  all 
nonzero  linear  combinations  of  the  component  functions  of  F. 

Theorem  1  Let  n,  m,  t  and  d  be  four  positive  integers  with 
n>  4,1  <t  <  n  —  3,1  <  d  <  n  —  t,m  <  n  —  d  and  = 

{A^  €  F£~d,i  =  1, . . .  ,m  |  wt(Ay^)  >  t  +  1,2/  €  IF2  }.  For 
any  a  €  Sn,rn,t,d,  let  s^c  =  |{j/  €  Ff  |  X)™  j  CjAy )  =  a}|  and 
s*  =  maxcgrj*  maxa  Sa]C.  We  now  define  a  function  F  :  FJ 
by 

F{y,x)  =  {A^x,A^x,...,A^x), 

where  y  =  (t/i, . . . ,  yd)  €  ,  a;  =  (xi, . . . ,  xn-d)  €.  .  Then 

the  following  holds: 

1.  F  is  uniformly  distributed  ifJ21Li  c»Ay'  /  0,  f0T  anV 
c  6  I"?  ,  c  ^  0. 

2.  F  is  t-resilient  if  for  any  a  €  IF!f~d  |  0  <  wt(a)  <  t  and 

c€F7,  c^0,  it  holds  that  d Ay  *  7^  a. 

3.  Nf  =  2n~1  —  s*2n_d_1 . 

'This  work  was  supported  in  part  by  Swedish  Research  Council 
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Note,  that  each  component  function  Ay^x  is  a  concatena¬ 
tion  of  2d  distinct  t-resilient  linear  functions  on  F£~d  ■  Hence, 
it  is  convenient  to  introduce  a  2d  x  m  matrix  Av,  having  as 
each  entry  an  (n  —  d,  1,  t)-resilient  Boolean  function  defined 
uniquely  by  a  vector  Ay’^  £  FTJ_d  s.t.  wt{Ay'^)  >  t  +  1. 
Due  to  the  first  two  parts  of  Theorem  1  each  row  of  Av  must 
span  an  [n  —  d,m,t  +  1]  linear  code. 

The  parameter  s*  is  the  number  of  repetitions  of  any  vector 
Ay in  any  nonzero  linear  combination  of  Ay's  columns.  It 
can  be  proved  that  the  nonlinearity  is  maximized  is  s’  =  1, 
which  means  that  each  vector  may  only  appear  once  in  any 
nonzero  linear  combination  of  Ay’s  columns. 

According  to  the  third  part  of  Theorem  1  the  nonlinearity 
is  maximized  when  d  is  maximized,  where  d  must  satisfy  a 
trivial  upper  bound  ("~d)  -F  ("~d)  +  ' '  •  +  (”Id)  >  2d.  We 
can  show  that  given  an  [n  —  d,m,t  +  1]  linear  code,  we  Eire 
able  to  fill  2m  —  1  out  of  2d  rows  of  Ay  without  violating  the 
restrictions  given  by  Theorem  1.  Thus,  given  [2d/(2m  —  1)1 
nonintersecting  [n  —  d,m,t  +  1]  linear  codes  the  matrix  Ay  can 
be  constructed. 

The  results  presented  here  are  obtained  using  computer 
search  for  nonintersecting  linear  codes.  A  comparison  with 
the  construction  described  in  [2]  is  presented  in  Table  1  in  the 
cEise  of  2-resilient  functions.  Such  a  favorable  comparison  can 
be  extended  to  any  order  of  resiliency  if  the  number  of  input 
variables  n  is  not  too  large,  say  for  n  <  25. 


Nf 

n  =  9 

n  =  10 

n  =  11 

n  =  12 

m 

Our 

[2] 

Our 

[2] 

Our 

[2] 

Our 

[2] 

2 

240 

192 

480 

384 

992 

896 

1984 

1792 

3 

192 

- 

448 

- 

960 

- 

1984 

1792 

4 

128 

- 

384 

- 

896 

- 

1920 

- 

5 

0 

- 

256 

- 

768 

- 

1792 

- 

6 

0 

- 

0 

- 

512 

- 

1536 

- 

Table  1:  Comparison  on  Np  of  2-resilient  functions 
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Abstract  —  It  is  proved  by  means  of  Ramsey-like 
technique  that  for  each  positive  integer  fc  there  exists 
a  minimal  nonnegative  integer  p'(fc)  that  any  n  —  k  th 
order  correlation-immune  function  of  n  binary  input 
variables,  /  ^  const  ,  depends  nonlinearly  on  at  most 
p'(fc)  inputs.  It  is  proved  that  the  number  of  n  —  k  th 
order  correlation-immune  functions  of  n  binary  input 
variables,  k  =  const  ,  n  — *  oo,  is  polynomial.  For 
k  =  1,2,3  the  exact  formulas  for  the  numbers  of  such 
functions  are  obtained. 

We  consider  memoryless  Boolean  functions  /:  GF( 2)n  — > 
GF( 2),  x  — »  /(x).  The  number  of  T’s  in  the  table  of  the 
Boolean  function  /  is  given  by  its  Hamming  weight  W/.  A 
memoryless  function  /  is  said  to  be  correlation-immune  of 
order  m,  with  1  <  m  <  n,  if  the  output  of  /  and  any  m 
input  variables  are  statistically  independent.  This  concept 
was  introduced  by  Siegenthaler  [3].  In  an  equivalent  non- 
probabilistic  formulation  (see  [4])  the  Boolean  function  /  is 
called  correlation- immune  of  order  m  if  W/,  =  W/2  for  any 
two  its  (n  —  m)-inputs  subfunctions  fi  and  fi- 

In  [1]  it  was  pointed  out  that  correlation-immune  function 
is  a  particular  case  of  an  orthogonal  array  (OA),  namely,  mth 
order  correlation-immune  function  of  n  inputs  with  weigth  W / 
corresponds  to  simple  (W/,  n,  2,  m)-OA.  Note  that  for  maxi¬ 
mal  m  such  that  a  function  is  correlation-immune  of  order  m 
the  value  m  +  1  was  called  a  dual  distance  of  a  code  (a  code 
is  a  characteristic  set  of  the  function)  by  Delsarte  [2]  and  de¬ 
clared  as  one  of  ”  four  fundamental  parameters  of  a  code” .  Of 
course,  Delsarte  did  not  use  the  words  "correlation-immune”. 

In  this  work  we  consider  (n— fc)th  order  correlation-immune 
functions  of  n  inputs  in  the  case  k  =  const  ,  n  — >  oo, 
i.  e.  higher  order  correlation-immune  functions.  Further, 
( n  —  fc)th  order  correlation-immune  functions  of  n  inputs  are 
called  fc-functions.  Also  we  assume  that  if  n  <  k  then  any 
Boolean  function  of  n  inputs  is  k- function. 

The  polynomial  representation  of  /  is  called  an  algebraic 
normal  form  (ANF)  of  the  function.  The  degree  of  /,  de¬ 
noted  by  deg(/),  is  defined  as  the  number  of  variables  in 
the  longest  term  in  ANF  of  /.  The  terms  of  length  1  are 
called  linear  terms.  We  say  that  the  Boolean  function  f(x i, 
®2,-  ■  ■  ,*1-1,  Xi,  Xi+ 1,. . .  ,x„)  depends  on  the  input  Xi  linearly 
if  the  variable  X,  presents  in  the  ANF  of  function  /  only  as 
a  linear  term  x,.  In  all  another  cases  we  say  that  the  func¬ 
tion  /(x i,*2, . . .  ,Xi_i,Xj,Xi+i, . . . ,x„)  depends  on  the  input 
Xi  nonlinearly  (including  the  case  then  the  input  x,  is  fictitious 
for  the  function  f{x i,  *2,. . . ,  x<_i,  Xj,  x,+i,. . . ,  xn)). 

Theorem  1.  For  each  positive  integer  k  there  exists  a 
minimal  nonnegative  integer  p'  (k)  that  any  k-function  f,  f  £ 
const  ,  depends  nonlinearly  on  at  mostp'(k)  inputs. 


A  Boolean  function  /  is  called  balanced  if  Wf  =  Wj. 

We  say  that  ^-function  /(x i,X2,...,xm)  is  a  repro¬ 
ductive  fc-function  if  the  function  g(xi,  *2, . . .  ,xm,  y)  = 
f{x i,*2, . . .  ,xm )  ®  y  is  fc-function. 

Remark,  k-function  f(x i,*2, . . .  ,xm)  is  a  reproductive  k- 
function  iff  it  is  true  at  least  one  of  two  following  conditions: 
a)  m  <  k;  b)  the  function  f  is  balanced. 

Corollary  from  Theorem  1.  For  each  positive  integer 
k  there  exists  a  minimal  nonnegative  integer  p(k)  that  any 
reproductive  k-function  f  depends  nonlinearly  on  at  most  p(k) 
inputs. 

It  is  obviously  that  p(fc)  <  p'{k).  Below  we  show  that 
p(k)  ^  p'(fc)  at  least  for  k  =  2,3. 

Theorem  2.  For  n  >  p'(k)  the  number  N(n,n  —  k )  of 
(n  —  k)th  order  correlation-immune  functions  of  n  inputs  is 
expressed  by  the  following  formula. 

p(k') 

N(n,  n  —  k)  =  A(k,  i)  f  U  j  -I-  2, 

i=0 

where  A(k,i)  is  the  number  of  i-inputs  reproductive  k- 
functions  that  depend  on  all  inputs  Xi,  X2,  . . . ,  Xi  nonlinearly. 

Corollary  from  Theorem  2.  The  asymptotics  of  the 
number  N(n,  n  —  k)  of  n  —  k  th  order  correlation-immune 
functions  of  n  inputs  is  expressed  by  the  following  formula, 
k  —  const  ,  n  — >  oo. 

Arc„  -  A{k, P{k))  p(k) 

iv (n, n  —  k)  ~  - — — — n  . 

p(k)\ 

Theorem  3.  p(l)  =  p'(l)  =  0,  p(2)  =  1,  p'( 2)  =  3, 
p(3)  =  4,  p'(3)  =  6; 

N(n,  n  —  1)  =  4  for  n  >  0, 

IV(n,  n  —  2)  =  2n  +  4  for  n  >  3, 

N(n,  n  —  3 )  —  n4  —  (2/3)n3  +  (5/3 )n  +  4  for  n  >  6. 

Theorem  4.  p(fc)  >  3  •  2k~ 2  -  2. 
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Abstract  —  By  using  the  action  of  the  Frobenius 
group  it  is  possible  to  decode  far  beyond  the  error- 
correcting  capability  of  GOPPA  codes  provided  the 
error-vector  has  a  definite  structure.  In  particular 
cases  the  generation  of  these  patterns  can  be  easily  de¬ 
scribed  and  gives  a  number  of  decodable  patterns  nu¬ 
merous  enough  to  avoid  enumeration.  We  show  that 
it  is  possible  to  use  this  property  to  strengthen  the 
McELlECE-type  cryptosystems  against  attacks  based 
on  random  decoding. 

I.  Introduction 

We  present  a  method  to  decode  error-patterns  of  large 
weight  in  GOPPA  codes  by  using  a  subgroup  of  the  automor¬ 
phism  group  of  the  Goppa  codes.  Namely,  we  use  the  group 
generated  by  the  Frobenius  automorphism.  By  its  action  on 
the  large  weight  error-patterns  one  obtains  error-patterns  of 
weight  less  than  the  error-correcting  capability  of  the  code. 

The  efficiency  of  this  method  and  the  number  of  decodable 
patterns  depend  on  the  nature  of  the  Frobenius  group.  In  a 
well  chosen  case  we  show  that  it  is  possible  to  decode  a  large 
number  of  patterns  with  weight  one  and  a  half  larger  than  the 
error-correcting  capability  of  the  code.  These  patterns  being 
easily  generated,  they  can  be  used  to  improve  the  work  factor 
of  the  random  decoding  attack  on  the  McEliece  public-key 
cryptosystem  without  increasing  the  size  of  the  public-key. 

II.  Automorphism  group  of  Goppa  codes 

Goppa  codes  are  a  subfamily  of  alternant  codes  generated 
by  a  polynomial  g  of  degree  t  over  a  finite  field  GF(2m).  The 
set  L  =  ( ai , . . .  ,  an)  of  elements  in  L  that  are  not  roots  of 
g  is  denoted  generating  vector.  The  Goppa  code  T(L,g)  of 
length  n  =  \L\  is  the  set  of  binary  words  a  —  (aai,  ■  •  •  ,aa„ ) 
such  that 

H.‘a  =  0 

where  H  =  (aj/giaj))*!^- 

Generally,  the  automorphism  group  of  a  Goppa  code  is 
trivial.  However,  we  showed  that  when  the  generating  poly¬ 
nomial  g  has  coefficients  over  a  subfield  GF(2S)  of  GF(2m), 
the  automorphism  group  contains  the  group  generated  by  the 
Frobenius  automorphism  a  of  GF(2m)/ GF(2S).  [2] 

III.  Tower  decodable  patterns 

Suppose  one  receives  the  word 

c  —  m  +  e 

where  m  is  a  word  in  F (L,  g),  g  is  taken  over  GF(2S)  and  e  is  an 
error- vector.  If  the  weight  of  e  is  less  than  the  error-correcting 


capability  t  of  the  code  then  one  recovers  m  easily.  Since  the 
automorphism  group  of  the  code  contains  the  group  generated 
by  the  Frobenius  automorphism  a  of  GF(2m)/ GF(21'),  any 
linear  combination  £oL/o  1  ^ ' a' (m)  °f  ^ie  transformed  of  m 
through  the  Frobenius  is  in  r(L,p).  The  transform  of  the 
error  pattern  e  becomes  2<!Lo*-1  and  we  say  that 

Definition  1  e  is  tower  decodable  in  T(L,g)  if 

1.  There  exists  linear  combinations  indexed  by  u 

m/s  —  1 

£ {u)  =  ^‘(e) 

*= 0 

of  the  cr'(e)  such  that  the  £^  can  be  decoded  in  the 
Goppa  code  T(L,g), 

2.  the  knowledge  of  some  of  the  £^  enables  the  receiver 
to  recover  the  error-pattern  e  with  a  certain  probability. 

IV.  Application  to  McEliece  cryptosystem 

We  take  the  McEliece  parameters  [3]  that  is,  the  private 
key  is  a  generating  matrix  of  a  Goppa  code  T(L,  g)  where  L 
is  an  indexation  of  GF(210)  and  g  is  irreducible  over  GF(210) 
of  degree  50.  The  error-correcting  capability  of  the  code  is  50. 
The  public  key  is  the  scrambled  private  key.  The  best  known 
work  factor  for  the  attack  by  random  decoding  is  264  [1]. 

In  our  scheme  we  take  g  over  GF(22)  C  GF(210).  Hence, 
the  Frobenius  group  has  cardinality  5.  The  public-key  is  the 
scrambled  private  key  reordered  according  to  the  orbits  of  the 
Frobenius  group.  By  randomly  chosing  a  number  of  25  orbits 
out  of  204  and  placing  randomly  3  bits  on  every  chosen  or¬ 
bit  we  construct  tower  decodable  patterns  of  weight  75.  The 
number  of  such  patterns  is  2188.  For  the  decoding,  we  need  to 
decode  at  most  two  words.  The  size  of  the  public-key  is  the 
same  as  in  the  original  scheme  whereas  the  work  factor  of  the 
attack  by  random  decoding  becomes  290. 
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Abstract-  A  method  for  analyzing  the  error  performance 
of  the  iterative  threshold  decoder  using  strict  sense  multi- 
orthogonal  convolutional  codes  where  the  multiplicity  order 
is  larger  or  equal  to  the  number  of  iterations  is  presented. 
This  allows  a  tractable  analysis,  since  all  random  variables 
are  considered  independent  at  each  decoding  step.  The 
analysis  provides  a  good  prediction  of  the  error  probability 
convergence  value  of  the  iterative  decoding  process  using 
strict  sense  doubly  orthogonal  convolutional  codes. 

I.  Introduction 

A  novel  iterative  threshold  decoding  procedure  without 
interleaving  has  been  introduced  in  [1],  This  technique  uses 
Convolutional  Self  Doubly  Orthogonal  Codes,  CS02C.  With 
these  codes,  the  need  for  interleaving  to  obtain  independent 
observables  at  each  iteration  is  alleviated  and  hence  the 
procedure  does  not  require  an  interleaver,  neither  at  the 
encoding  nor  at  the  decoding  process.  The  double 
orthogonality  property  of  the  code  may  be  defined  either  in 
the  wide  sense  (CS02C-WS)  or  in  the  strict  sense  (CS02C- 
SS)  [2].  The  rate  Vi  codes  CS02C-WS  allow  some  repetitions 
of  observables  which  produce  correlated  inputs  at  the  second 
iteration.  For  the  rate  V2  code  CSO'C-SS,  all  repetitions  of 
observables  are  avoided  by  using  a  parallel  structure  of  the 
encoder.  The  definition  of  CS02C-SS  may  also  be  extended 
to  multiple  orthogonality  of  order  M  where  no  repetition  is 
possible  over  M  consecutive  iterations.  Such  codes  are  called 
Strict-Sense  Convolutional  Self  Multi-Orthogonal  Codes, 
CSOMC-SS. 

In  this  paper,  we  present  a  method  for  analyzing  the  error 
performance  of  the  iterative  threshold  decoder  using  CSO“C- 
SS  where  M  is  larger  than  or  equal  to  the  number  of 
iterations. 

II.  Bit  Error  Performance  for  SINGLE 

DECODING  ITERATION 

A  threshold  decoder  produces  at  its  output  an  approximated 
Maximum  A  Posteriori  (MAP)  value,  X(i),  for  each 
information  symbol  u.  to  be  decoded  at  time  i.  This  MAP 
value  corresponds  to  a  summation  of  J  parity-check 
equations  \| /.(/),  at  time  i,  over  the  currently  received 
information  symbol  y„(i),  that  is  : 

a(0  =  yK(f)+'2,vJW  (1) 

j= 1 

The  parity-check  equations  ij tfi)  are  obtained  using  add- 
min  operators  as  defined  in  [1].  This  operator  represents  an 
approximation  of  the  log-likelihood  ratio  (LLR)  of  the 
modulo-2  sum  of  binary  random  variables.  Since  CSOC  is 
used,  X(i)  is  a  sum  of  independent  random  variables  (RV) 
and  the  Probability  Density  Function  (PDF)  of  Mi)  is  the 
convolution  of  the  PDFs  of  each  RV  which  belongs  to  the 
sum  given  by  (1).  Even  though  the  RVs  are  not  identically 
distributed,  they  are  somewhat  similar  and  hence  their  sum 


tends  to  be  gaussian.  The  average  value  A  of  X(i)  is 
obtained  as  a  sum  of  the  means  of  all  RVs.  Similarly,  its 
variance  cr2  may  also  be  expressed  as  a  sum  of  the 

variances  of  all  RVs  yfi)  and  V| iff) .  Therefore,  the  bit  error 
probability,  Pb(E) ,  may  be  approximated  by  : 

Pb(E  )  ^  j  exp[-  (l  -IJ/20I  y .  4A*  )  (2) 

J27102 

In  order  to  evaluate  (2),  we  have  determined  the  PDF  of 
the  RV  \j rfi)  where  all  its  N  constituant  RVs  are  gaussian 
distributed  with  mean  m,  and  variance  of ,  l-  1,  2,  ...,  N. 
Using  (2)  and  considering  a  feedback  threshold  decoder,  we 
have  calculated  Ph(E)  for  different  CSOC.  Comparisons 
between  theoretical  and  simulation  results  show  only  a  small 
discrepancy,  thus  confirming  the  validity  of  the  approach. 

III.  Extention  to  Iterative  threshold  decoding 

WITHOUT  INTERLEAVING 

The  above  analysis  is  extended  to  obtain  Pb(E)  for 
multiple  iterations  where  the  code  used  is  CSOMC-SS. 
Hence,  all  RVs  .are  considered  independent  at  each  decoding 
step.  The  main  idea  behind  this  approach  is  to  apply 
recursively  the  method  developed  in  Section  II.  We 
consider,  for  the  current  iteration  m,  that  inputs  provided  by 
the  previous  iteration  (m-1)  are  gaussian  distributed  with 

mean  A,1"1'1’  and  variance  al(m  -  1).  Simulation  results  for 
rate  Vi  CS02C-SS  codes  having  J  parity-check  equations 
coincide  with  those  predicted  using  the  theoretical  analysis 
where  rate  Vi  CSCUC-SS  codes  with  a  value  of  J  are  used.  In 
addition  to  validating  the  analyzing  this  also  shows  that  only 
double  orthogonality  is  in  effect  needed  in  order  to  obtain 
good  error  performance.  The  results  also  indicate  that 
CSOMC-SS  codes  converge  more  quickly  than  CS02C-SS 
codes. 

IV.  Conclusion 

We  have  presented  a  method  for  evaluating  the  bit  error 
probability  of  iterative  threshold  decoding  using  CSOMC-SS. 
This  method  is  based  on  the  evaluation  of  the  probability 
density  function  of  the  approximated  MAP  value  obtained  at 
the  output  of  the  threshold  decoder.  The  multiple 
orthogonality  is  shown  to  be  useful  in  the  analysis  of  the 
performance  of  CS02C-SS  codes.  Furthermore,  the  results 
indicate  that  doubly  orthogonal  CS02C-SS  codes  may  be 
sufficient  to  obtain  a  good  error  performance. 
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Abstract  —  New  bandwidth  efficient  Type-I  and 
Type-II  hybrid-ARQ  (HARQ)  schemes  using  Turbo 
Trellis  Coded  Modulation  (TTCM)  are  proposed. 
These  schemes  combine  the  power-efficiency  of  turbo 
codes  with  the  bandwidth  efficiency  of  Trellis  Coded 
Modulation  (TCM)  to  give  efficient  FEC/ARQ  sys¬ 
tem  designs. 

I.  Introduction 

When  data  is  transmitted  in  the  form  of  packets,  it  is  common 
to  use  Automatic  Repeat  reQuest  (ARQ)  or  retransmission 
techniques  in  addition  to  Forward  Error  Correction  (FEC)  to 
improve  the  performance  of  a  communication  system.  Sev¬ 
eral  schemes  have  been  suggested  in  which  code  combining 
retransmission  schemes  using  low  rate  turbo  codes  have  been 
shown  to  yield  good  performance  [1].  Separately,  sequence 
combining  TCM  schemes  have  been  proposed  for  systems  re¬ 
quiring  higher  throughputs  [2],  In  this  paper,  we  present  sev¬ 
eral  TTCM  schemes  for  use  in  ARQ  systems,  thereby  com¬ 
bining  the  advantages  of  TCM  with  those  of  Turbo  codes  in 
a  retransmission  environment. 

II.  System  Description 

We  assume  a  selective  repeat  ARQ  scheme  with  suitably  large 
buffers  at  the  transmitter  and  receiver.  Furthermore,  we  as¬ 
sume  an  error  free  feedback  channel  over  which  positive  ( ACK) 
or  negative  (NACK)  acknowledgements  can  be  sent.  The  un¬ 
derlying  TTCM  scheme  used  here  is  the  one  proposed  by 
Berrou  et.  al  [3].  A  coherent  receiver  model  is  assumed.  The 
data  sequence  consists  of  information  bits  and  a  16-bit  CRC 
sequence.  The  sequence  is  fed  into  the  Turbo  encoder  whose 
output  is  punctured  to  the  desired  rate  and  formatted  into 
P-symbol  data  packets,  U  =  {ui,  -  ■  -  up),  where  each  symbol 
m  consists  of  m  bits  (i.e.,  a  signal  constellation  size  of  2m). 

The  following  HARQ  schemes  using  TTCM  axe  considered. 
In  Scheme  1 ,  the  same  packet  is  retransmitted  until  the  re¬ 
ceiver  accepts  it  as  error  free  or  until  a  preset  maximum  al¬ 
lowed  number  of  retransmission  attempts  is  reached.  The  er¬ 
ror  prone  packets  in  the  previous  transmissions  are  discarded. 
In  Scheme  2,  also  known  as  an  average  diversity  combining 
scheme,  copies  of  a  retransmitted  packet  are  combined  into  a 
single  packet  of  the  same  blocksize  by  averaging  the  soft  de¬ 
modulated  values  of  each  packet  and  then  decoding.  Scheme  3 
is  an  incremental  redundancy  scheme  where  received  packets 
are  concatenated  to  form  noise-corrupted  codewords  from  in¬ 
creasingly  longer  and  lower  rate  codes.  During  the  first  trans¬ 
mission  only  the  information  bits  axe  sent.  Subsequently,  the 
check  digits  are  incrementally  transmitted  to  adaptively  meet 
the  error  performance  requirements  of  the  system.  Finally,  we 

1  This  work  was  supported  by  Motorola  Inc.,  NASA  grant  NAG 
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assume  the  blocksize  is  the  same  for  all  transmissions  in  order 
to  keep  network  overhead  to  a  minimum. 

III.  Numerical  Results 

Scheme  1  and  Scheme  2  employing  TTCM  use  a  rate  2/3 
Turbo  code  obtained  by  puncturing  parity  bits  from  a  4-state 
(7,  5 )octai  constituent  recursive  convolutional  encoder  along 
with  Gray  mapping  to  a  8PSK  signal  constellation.  The  turbo 
decoder  uses  the  APP  algorithm.  After  every  iteration,  the 
CRC  is  checked.  Scheme  3  employing  TTCM  uses  a  mother 
turbo  code  of  rate  1/3  mapped  to  an  8PSK  constellation  and 
higher  rates  are  achieved  by  puncturing.  Scheme  1  employing 
TCM  uses  a  rate  2/3  convolutional  encoder  obtained  by  punc¬ 
turing  a  rate  1/2  16-state  (23,35)octai  convolutional  encoder. 
The  throughputs  axe  plotted  in  Fig.l  for  an  information  block- 
size  of  512  on  an  AWGN  channel. 


Fig.  1:  Throughput  comparison  of  various  schemes 


IV.  Conclusions 

In  this  paper,  a  new  application  of  turbo  codes  to  bandwidth 
efficient  ARQ  schemes  is  introduced.  Since  the  combining 
schemes  described  here  use  a  single  decoder  to  decode  any 
received  packet  or  any  combination  of  received  packets,  the 
implementation  of  these  protocols  requires  only  minor  modi¬ 
fications  to  the  transmitting  and  receiving  systems  of  a  stan¬ 
dard  turbo  code. 
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Convergence  of  Relative  Frequency  of  Occurrence  of 
Error  Bursts  on  Channels  with  Memory 
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Abstract  —  The  exponential  convergence  of  the  rela¬ 
tive  burst  weight  Wb(Zj  . .  .Zn)/n,  i.e.,  the  relative  fre¬ 
quency  of  occurrence  of  bursts  is  established  for  a 
broad  class  of  functionals  {Zi}  of  finite  Markov  chains. 


I.  Introduction 

Motivated  by  the  intention  to  evaluate  asymptotically 
multiple-burst-error-correcting  codes  on  channels  with  mem¬ 
ory,  the  exponential  convergence  of  the  relative  burst  weight 
Wb{Z\  . . .  Zn)/n  is  established  for  a  broad  class  of  functionals 
{Zi }  of  finite  Markov  chains  (MCs).  Here,  b  is  a  fixed  but  arbi¬ 
trary  positive  integer,  and  Wb{Z\ . . .  Zn)  denotes  the  number 
of  error  bursts  of  length  <  6  that  appear  in  Z\  . . .  Zn,  which 
is  to  be  viewed  as  the  additive  noise  sequence  of  a  channel. 

The  standard  notation  we  employ  includes  N  =  {1,  2, . . .  }, 
A*  which  denotes  the  set  of  all  finite-length  sequences  of  sym¬ 
bols  from  A ,  and  ij,  =  xm  . . .  xn. 

We  treat  a  stochastic  process  {Z;},6n  such  that  Zi  —  f{Ui), 
i  €  N,  for  some  homogeneous  MC  {C,  },gN  and  some  function 
/  :U  — >  Z  —  f(H),  where  U  is  finite,  \Z\  >  1,  and  Z  contains 
the  symbol  0.  We  adopt  the  definitions  of  primary  notions 
(such  as  burst  weight  and  swept  coverings)  in  [1,  Section  2). 

II.  Results  and  Derivation 

We  parse  the  observed  sequence  Z\  Z2  ■  .  ■  into  phrases 

7T1  7t2 

■"1 

tt  <  T2  <  ■  ■  •  €  N,  so  that  Z\x ,  Z^]+1 , . . .  belong  to 
W  =  {0}U(2\{0})2t-1, 


where  ( Z  \  {0 })Zb~1  C  Z *  denotes  the  set  of  ( \Z\  —  1)|Z|6-1 
phrases  of  length  b  whose  leading  symbols  are  not  zero.  Then, 
by  Corollary  1  of  Hamada  [1,  Section  2],  the  number  of  ap¬ 
pearances  of  phrases  belonging  to  (Z  \  {0}  )Zb~1  in  the  parsed 
sequence  up  tq  time  n  is  the  burst  weight  Wb  (Z) ).  where  we  ig¬ 
nore  the  possible  existence  of  the  incomplete  phrase  in  the  last 
position,  which  may  cause  a  negligible  disagreement  with  the 
true  burst  weight.  The  point  is  that  the  above  parsing  substi¬ 
tutes  for  Procedure  1  of  Hamada  [1,  Section  2]  for  the  purpose 
of  obtaining  the  swept  covering  for  {*  €  {1, . . . ,  n}  :  Z,  A  0} 
whose  size  is  Wb(Z?). 

Let  l(n)  denote  the  number  of  all  phrases  produced  up 
to  time  n  in  the  parsing  of  Z1Z2  ... ;  let  Q„(u>)  denote  the 
number  of  occurrences  of  phrase  w  up  to  time  n  divided 
by  l(n),  n  >  b.  Then,  Wb(Z{)  ~  Z(n){l  -  Q„(0)}  and 
n  ~  Z(n){Q„(0)  +  6(1  —  Q„(0))}.  Therefore,  approximately, 


Wb(Zl)/n  ~  Wn  d= 


1-Qn(0) 


Qn(0)  +  6(1  -  Qn(0)) 


(1) 
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This  indicates  a  one-to-one  correspondence  between  (the  ap¬ 
proximation  of)  Wb(Zi)/n  and  Q„(0),  and  hence,  the  behav¬ 
ior  of  Wb{Zi)/n  hinges  on  that  of  Q„(0). 

Now  consider  the  dissection  or  parsing  of  the  underlying 
MC  U1U2  ■  corresponding  to  that  of  Z1Z2  ■  ■  ■  ■ 

Vl  =  vp,v,  =  uz+1,...t 

where  TT  <  T2  <  ■  •  •  are  the  time  instances  at  which  the 
partitions  of  Z\  Z2  ■  ■  ■  occur.  Clearly,  V) ,  V) , . .  ■  all  belong  to 

V  =  /-1(0)  U  (U  \  /-1(0))Wfc_1, 

where  (U  \  /_1  (O))^/4-1  C  U*  is  the  set  of  all  phrases  of  length 
6  whose  leading  symbols  do  not  belong  to  /*’  (0).  Then, 

Qn(0)  =  Yj  (2) 

vef~H  0) 

where  the  relative  frequencies  Pn(u)  of  v  S  V  axe  defined  sim¬ 
ilarly  to  Q„(tc),  w  £  W.  Note  that  {14}*6iv  is  a  MC  whose 
transition  probabilities  me 

P(v'\v)  =  P(w'|u),  v,  v'  €  V,  (3) 

where  v  denotes  the  last  symbol  of  v,  and  P(x 2X3  . . .  x„  |xi )  = 
n;=;  p(  Xi+i|xj)  is  determined  by  transition  probabilities 
P(w'|u),  u,u'  S  U,  of  the  underlying  MC  {C, },gn  for  any 
Xi  €  Un,  n  e  N,  n  >  2.  Thus,  from  (1),  (2),  and  the  strong 
law  of  large  numbers  for  MCs,  we  have 

Theorem  1  If  the  phrase-to-phrase  MC  {14}  is  irreducible, 
and  II  is  the  stationary  distribution  of  {14},  then 

WbiZ1  )  <y_f - 1 — y —  _>  qq)  almost  surely, 

n  y  +  6(1  -  y) 

where  y  =  E,,e/-i(o)  n(w)- 

This  result  can  be  strengthened  by  the  method  of  types: 

Theorem  2  Let  J  C  [0,1/6]  be  an  interval  whose  end  points 
are  distinct.  If  {14}  is  irreducible,  then 

lim  n“1logPr{W6(Zr)/n  G  J}  =  -  inf  £>($||P)/L($), 

n— too  4>€T 

where  T  =  f  i(o)  ^(u))  €  J,  =  4>|,  $  and  $ 

denote  the  two  marginals  of  a  probability  distribution  $  on 
V  X  V  as  in  [2,  p.  790],  the  usage  of  D  in  [2,  Eq.  (12)]  is 
also  adopted,  P  is  given  in  (3),  and  L($)  =  $(u)  X 

( length  of  v). 
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Abstract  —  In  a  constant  weight  w  code  of  length  n,  each 
code  word  has  w  l’s  and  n  —  w  0’s.  If  the  ratio  w/n  is  low, 
the  code  is  referred  to  as  a  low  constant  weight  (LCW) 
code.  In  this  paper,  some  simple  designs  of  LCW  codes 
are  presented.  Further,  the  speed  performance  of  these 
codes  is  derived  and  then  it  is  shown  that  these  codes  have 
much  better  performance  than  the  dual-rail  codes,  when 
used  in  asynchronous  buses. 

Index  terms:  low  weight  codes,  constant  weight  codes,  un¬ 
ordered  codes,  proximity  detecting  codes,  asynchronous  communi¬ 
cation  systems,  low  power  systems. 

In  [2],  it  has  been  shown  that  the  lower  the  weight  of  the  codes 
the  faster  is  the  implementation  of  an  asynchronous  bus.  Hence, 
low  weight  codes  find  a  direct  application  in  the  design  of  parallel 
asynchronous  communication  systems.  Low  weight  codes  also  find 
application  in  the  design  of  low  power  VLSI  systems  [3], 

This  paper  presents  some  low  constant  weight  codes  which  are 
very  efficient  in  terms  of  speed  if  used  in  realizing  an  asynchronous 
communication  system.  They  are  also  efficient  in  terms  of  complex¬ 
ity  and  redundancy.  First  their  construction  will  be  sketched  and 
then  their  speed  performance  will  be  quantified.  Let  5*,  indicate 
the  set  of  all  words  of  length  k  and  weight  w  and  DC(n,fc,tu)  in¬ 
dicate  a  binary  block  code  of  length  n,  constant  weight  w  and  k 
information  bits. 

DC(n  =  4,  k  =  2,  w  =  1)  code  design.  This  design  is  defined 
by  the  following  encoding  function,  £  :  ZZ^  -t  Si ,  for  the  code: 
£(00)  =  0001,  £(01)  =  0010,  £(10)  =  0100,  £(11)  =  1000.  Note 
that  both  encoding  and  decoding  functions  can  be  realized  with 
extremely  simple  logic.  Note  also  that  concatenating  this  code  with 
itself  it  is  possible  to  obtain  very  simple  DC(n  =  2k,  k,w  =  0.5k) 
codes  which  require  the  same  number  of  redundant  bits  as  the  usual 
dual-rail  code,  but  the  number  of  l’s  in  each  code  word  is  only  half 
that  of  a  dual-rail  code  word. 

DC (n  =  7,  k  =  4,  w  =  2)  code  design.  This  design  is  defined 
by  the  following  encoding  function  £  :  for  the  code  which 

is  defined  in  terms  of  boolean  logic  (given  x,  y  £  ZZ2 ,  let  x-y  indicate 
the  logical  AND  between  x  and  y,  x  V  y  the  logical  OR  between  x 
and  y,  and  x  the  logical  NOT  of  x):  £(11x2^3^4)  =  (xi  -X2,  &l 

XT-X2,  X3-X4  V  XI-X2-X3,  X3 -X4  V  xf  -xj-xj,  X3  ■  X4  V  xj -X2  •  X4 , 
x^-xTV  xT-xJ-xJ).  Whereas,  £— 1  (vi  1/2 J/3 1/4 J/5 1/6 V7>  =  (yi  V  y2, 
Vi  V  2/3,  2/4  V  2/6 '2/7i  2/6  V  V5 ‘2/7 ) •  Also  in  this  case  both  encoding  and 
decoding  functions  can  be  realized  with  very  simple  logic.  Further, 
note  that  concatenating  this  code  with  itself  it  is  possible  to  obtain 
DC(n  =  1.75fc,fc,tu  =  0.5k)  codes  which  have  the  same  features  as 
the  codes  given  above  but  are  less  redundant. 

DC(n  =  13,  A:  =  8,  w  =  3)  code  design.  Note  that  any  constant 
weight  3  coding  method  requires  at  least  5  extra  check  bits  to  encode 
8  bit  data.  Further,  using  5  check  bits,  8  is  the  maximum  length 
of  information  word  that  can  be  made  constant  weight  3.  Thus, 
this  code  is  optimal  from  the  redundancy  point  of  view.  The  code 
design  is  also  simple  because  the  whole  coding  system  (encoder 
plus  decoder)  for  this  code  can  be  implemented  using  less  than 
1070  transistors  with  a  depth  of  less  than  30  transistors.  Because 
of  the  space  limitation,  the  code  design  is  not  given  here.  Note  that, 
concatenating  this  code  with  itself  it  is  possible  to  obtain  efficient 
DC(n  =  1.625 k,k,w  =  0.375 k)  codes. 

Speed  performance  analysis  of  LCW  codes  in  the  asyn¬ 
chronous  communication  scheme.  Parallel  asynchronous  com¬ 
munication  in  asynchronous  busses  is  realized  using  unordered  codes 

[1],  [4],  [2],  Researchers  in  [1],  have  modeled  the  asynchronous  com¬ 
munication  scheme  as  a  situation  in  which  the  sender  communicates 
with  the  receiver  using  n  parallel  tracks  (the  bus  lines)  by  rolling 
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Tab.  1:  Minimum  speed-up  comparisons  between  some  t-PD  codes 
and  the  dual-rail  code  with  k  =  8  information  bits  used  as  0-PD 
code.  The  codes  DC(ro)  are  constant  weight  w  codes.  The  code 
BC(w  =  5.36)  is  a  Berger  like  code  designed  to  minimize  the  average 
weight  per  code  word.  The  codes  l-VP(tu  =  7.29)  and  2-VP(ui  = 
7.73)  are  respectively,  systematic  1  and  2-PD  codes  of  [4], 

marbles  in  the  tracks.  If  the  i-th  component  of  the  code  word  is  a 
1  then  the  sender  rolls  a  marble  in  the  i-th  track  of  the  bus.  The 
amount  of  time  a  marble  takes  to  travel  from  the  source  to  the 
destination  is  unknown  and  may  differ  from  track  to  track  or  even 
from  roll  to  roll.  But  it  is  non-negative  and  finite.  The  only  way  the 
receiver  has  to  realize  the  complete  reception  of  the  code  word  is  to 
make  a  membership  test  of  the  current  word  in  the  unordered  code 
at  the  receiver  end  of  the  bus.  Once  complete  reception  is  detected, 
the  receiver  sends  an  acknowledgment  signal  to  the  sender  indicat¬ 
ing  that  it  is  ready  to  receive  the  next  code  word.  In  the  usual 
implementation  of  the  scheme,  the  receiver  detects  the  complete 
reception  of  the  word  when  the  last  marble  of  the  word  is  received. 
This  is  a  special  case  of  t-proximity  detection  [4]  in  which  certain 
t-proximity  detecting  (<-PD)  codes  are  used  to  allow  the  receiver  to 
send  the  acknowledgment  signal  to  the  sender  when  all  but  t  of  the 
transmitted  1/marbles  of  a  code  word  have  been  received.  Exam¬ 
ples  of  <-PD  codes  are  constant  weight  codes,  for  all  f  >  0  [4].  For 
j  =  1, 2, . . . ,  xu  let  Xj  be  the  random  variable  which  represents  the 
transmission  time  for  the  j-th  marble  of  the  word.  In  [2],  assuming 
that  the  Xj’s  are  continuous,  independent  and  all  uniformly  dis¬ 
tributed  over  the  time  interval  [tmimtmax],  it  is  shown  that  the 
average  transmission  time  for  a  code  word  of  a  <-PD  code  is 

—  .  w  —  t  , 

I  t  — P£>(u>)  =  ^min  4*  ~{tmax  Amin)* 

W  +  1 

In  this  paper,  using  the  above  formula  we  are  able  to  quantify  the 
speed  performance  of  i-PD  codes,  t  >  0,  and  make  the  transmission 
time  comparisons  given  in  Table  1.  Analogous  conclusion  can  be 
drawn  for  distributions  which  are  different  from  the  uniform  distri¬ 
bution  given  above. 
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Abstract  —  In  this  paper,  we  consider  the  concatena¬ 
tion  of  turbo  codes  and  M- ary  orthogonal  modulation. 
A  modified  decoding  algorithm  that  utilizes  the  cor¬ 
related  nature  of  an  orthogonal  symbol  drawn  from 
a  Hadamard  matrix  and  the  soft-input/soft-output 
module  of  turbo  codes  is  introduced.  This  improved 
decoding  algorithm  can  significantly  lower  the  error 
shoulder  and  reduce  the  SNR  required  for  a  given  er¬ 
ror  probability. 

I.  Introduction 

In  our  research,  a  concatenation  of  turbo  codes  (outer  code) 
and  orthogonal  codes  (inner  code)  is  investigated  for  PCS  ap¬ 
plications.  We  propose  a  two-stage  joint  decoding  of  both 
turbo  codes  and  orthogonal  codes.  The  proposed  algorithm 
iteratively  evaluates  the  log-likelihood  metrics  for  both  sys¬ 
tematic  and  'turbo-encoded  parity  bits  and  feeds  the  a  priori 
information  back  to  the  orthogonal  decoder. 


II.  System  Model  and  Joint  Decoding 

An  information  sequence,  dk  €  {0, 1},  is  encoded  with 
a  rate  1/3  turbo  code.  The  coded  sequence  is  then  multi¬ 
plexed  onto  a  single  data  stream,  block  interleaved  to  break 
up  the  correlation  among  the  consecutive  bits,  and  passed  to 
an  orthogonal  modulator.  The  orthogonal  modulation  is  a 
Hadamard  matrix  with  rate  r  =  log2  M/M  =  K/N.  An  addi¬ 
tive  white  Gaussian  noise  (AWGN)  channel  with  and  without 
Rayleigh  fading  is  considered  in  this  paper  and  noncoherent 
reception  is  used  at  the  receiver. 

The  proposed  iterative  decoding  process  is  composed  of  two 
stages.  The  first  stage  is  the  maximum  a  posteriori  (MAP) 
decoding  of  the  Hadamard  matrix.  The  second  stage  is  the 
modified  turbo  decoding. 

For  the  first  stage  decoding,  the  received  complex  signal 
is  processed  with  the  fast  Hadamard  transform  and  then 
square-law  combined  to  form  a  decision  vector  w,  where 
w  =  {lei, ...,  wat_i}.  If  the  row  of  the  Hadamard  matrix 
(Hi)  is  sent,  the  conditional  probability  is  given  as  follows 


p(«b|Hi) 


«»,•+» 3 

=  2^e 

VJ  j 

P"(wi)  =  2^e_3^ 


Io(^) 


for  j  —  i 
for  j  ^  i 


where  a2  is  the  average  noise  variance,  z2  is  the  expected 
received  signal  energy  and  /o()  is  the  zeroth-order  modified 
Bessel  function  of  the  first  kind. 

By  applying  Bayes  law,  the  a  posteriori  probability 
can  be  expressed  as  P(Hi|w)  =  p(w|Hi)P(Hi)/p(w),  where 
p(w|Hi)  =  p„(w o)  •  ...  -pa(wi)  ■  ...-pn(wN- 1)  and  p(w)  is  a 
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constant  independent  of  i.  With  the  assumption  of  large 
packet  size  and  proper  interleaving,  the  information  bits 
fed  into  one  orthogonal  symbol  (aro-.-s/c-i)  are  assumed 
to  be  independent  and  therefore  the  a  priori  information 
P(Hi)  =  P(xo)P(xi)...P(xk-i)-  The  next  step  is  to  evaluate 
the  conditional  probability  of  the  K  systematic  bits  of  the 
Hadamard  function.  This  can  be  done  by  summing  the  objec¬ 
tive  xk,  P(xk  =  0|w)  =  E"-1  o  ^(Hi|w).  Finally,  we  can 
evaluate  the  density  function  p(w|a;*)  by  applying  Bayes  law. 
This  information  will  be  passed  onto  the  turbo  decoding. 

The  second  stage  is  an  iterative  decoding  process  and  the 
derivation  of  this  algorithm  is  well  described  in  previous  pa¬ 
pers.  After  several  turbo  iterations,  the  log-likelihood  ratio 
(LLR)  of  the  systematic  bits  will  converge.  The  LLR  of  sys¬ 
tematic  bits  are  used  as  a  prior  information  to  decode  the 
turbo-encoded  parity  bits  [2].  The  decoding  process  is  similar 
to  that  of  the  systematic  bits.  Finally,  all  the  LLR  information 
is  then  fedback  as  a  priori  information  to  the  inner  orthogonal 
code.  In  this  two  stage  sequential  decoding,  the  quality  of  the 
turbo  decoding  output  is  very  important. 

III.  Result  and  Conclusion 

Figure  1  shows  the  bit  error  rate  (BER)  of  the  64- ary  sys¬ 
tem  in  both  cm  AWGN  and  Rayleigh  fading  channels  with 
a  packet  size  of  1200  information  bits.  The  system  without 
feedback  to  the  orthogonal  codes  (i.e.,  without  joint  decod¬ 
ing  (JD))  but  with  10  turbo  decoding  iterations  is  compared 
to  the  system  with  5  initial  turbo  decoding  iterations,  one 
feedback  to  the  orthogonal  decoding,  and  5  additional  turbo 
decoding  iterations.  The  proposed  JD  algorithm  achieves  a 
significant  reduction  in  error  probabilities.  The  error  shoul¬ 
der  introduced  by  the  turbo  code  is  also  shown  to  be  lowered 
to  the  region  beyond  interest. 


Fig.  1:  BER  for  64-ary  noncoherent  reception. 
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Abstract  —  A  Turbo  nonlinear  continuous  phase  frequency 
shift  keying  (CPFSK)  with  iterative  maximum  a  posteriori 
probability  (MAP)  decoding  is  proposed.  It  uses  the  nonlinear 
CPFSK  encoding  components  proposed  by  us.  They  have 
nonlinear  characteristics  and  good  performance  of  power  and 
bandwidth  with  simple  structures. 

I.  Introduction 

Turbo  codes,  newly  invented  by  Berrou  et  al  [1],  are  good 
error-correcting  codes  that  yield  remarkable  bit  error  rates 
(BER)  close  to  Shannon  limits  with  simple  encoding  structures. 
And  they  were  proposed  for  BPSK  modulation.  But  when  more 
spectral  efficiency  is  needed  such  as  in  satellite  communications, 
other  modulations  with  constant  envelopes  may  be  used. 

In  constant  envelope  digital  modulation  the  information¬ 
carrying  phase  varies  continuously,  which  reduces  the  side  lobe 
of  the  spectrum  of  signal.  Such  a  modulation  scheme  is  called 
the  continuous  phase  modulation  (CPM)  [2].  CPM  can  be 
decomposed  into  the  continuous  phase  encoder  (CPE)  and  the 
memoryless  signal  mapping  (MM)  modulation  part  in  the  same 
way  as  trellis-coded  modulation  (TCM).  The  nonlinear  CPFSK 
introduces  a  nonlinear  modulation  index,  which  is  a  kind  of 
multi-h  CPM  with  the  non-time-varying  phase  trellis.  And  we 
have  proposed  an  A/-ary  nonlinear  CPFSK  scheme  to  achieve 
more  spectrum  efficiency  [3].  Also,  it  achieves  higher  improved 
performance  than  ordinary  CPFSK. 

In  this  paper,  we  propose  the  Turbo  nonlinear  CPFSK 
systems  with  iterative  MAP  decoding.  The  overall  structure  is 
similar  to  Turbo  codes  or  Turbo  TCM  [4]  but  uses  the  proposed 
nonlinear  CPFSK  encoding  components  that  allow  better 
performance  than  RSC  components  in  Turbo  code.  The  Turbo 
nonlinear  CPFSK  code  has  nonlinear  characteristics  and  good 
performance  of  power  and  bandwidth  with  simple  structures. 
The  proposed  schemes  improve  the  performance  relatively  over 
Turbo  code  with  modulations  and  overcome  the  nonlinear 
property,  so  they  can  provide  reliable  communications  such  as 
in  satellite  channels. 

n.  The  Turbo  Nonlinear  Cpfsk 

The  decomposition  of  the  nonlinear  CPFSK  —  the 
decomposition  of  the  MM  modulator  and  the  nonlinear 
continuous  phase  encoder  (NCPE). 

A  new  representation  of  coded  symbols  considered  as  the 
sum  of  the  product  input  and  the  past  symbols  of  the 
convolutional  encoder  is  expressed  by 

fog 

4’un)  =  J—cos{w0t  +  yr(t,un)),  (1) 

where  E  is  the  bit  energy,  T  is  the  symbol  duration,  and  w0  is  the 
carrier  frequency.  To  produce  the  present  input  symbol  un  and  the 
present  state  Vn,  it  introduces  the  nonlinear  symbol  into  the  CPE  and 
the  nonlinear  MM  modulator.  Then  inputs  of  the  nonlinear  MM 
modulator  can  be  represented  as  Af-ary  symbol  un  and  V„. 

'V(t,un)=  2 modN  [u„ ^  +  modN  [v„  ] |j  ( 2 ' a) 


=4*./)  =  /o  +XM.-1  +  ^2'b^ 

i=1 

Vn=an_rM'-'+...  +  an„c+rM\  (2-c) 

V„+1  -  modN  [v„  (2:d) 

where  u ,  V,  and  the  nonlinear  mapping  coefficients  /  are  defined  in 
the  modular  N  spaces,  M  presents  M-ary  symbol  and  c  is  the  number 
of  input  symbols  at  the  nonlinear  MM. 

The  overall  system  —  we  apply  the  basic  principles  and 
modified  structures  of  Turbo  codes  for  the  Turbo  nonlinear 
CPFSK.  A  search  for  good  component  codes  is  performed  from 
Eq.  (4).  In  the  presence  of  AWGN,  the  probability  of  the 
nonlinear  CPFSK  maximum  likelihood  (ML)  receiver  making  an 
erroneous  decision  can  be  closely  approximated  by 


( 

\dlmEb  \ 

[v 

- > 

O 

<*L  =  ~  min  lim  [^[l-Ccos  <t>  (t,a)  -  cosip  (t,b))]dt  (4) 

T  a,b  Nr->“J0 

where  Eh  is  bit  energy,  d^ia  is  the  normalized  squared  Euclidean 
distance  and  Nr  is  the  number  of  interval  of  remerge  path. 

The  iterative  decoder  consists  of  two  identical  concatenated 
decoders  of  the  component  codes  separated  by  the  interleaver. 
The  component  decoders  are  based  on  MAP  algorithms 
generating  weighted  soft  estimates  of  the  input  sequences. 

m.  Conclusions 


In  this  paper,  we  have  presented  Turbo  nonlinear  CPFSK 
systems  that  have  iterative  MAP  decoders.  The  overall  structure 
is  similar  to  Turbo  codes  or  Turbo  TCM  codes  but  exploits  the 
non-linearity  of  the  component  codes.  They  have  nonlinear 
characteristics  and  good  performance  of  power  and  bandwidth 
with  simple  structures. 
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Abstract  —  We  present  upper  bounds  for  bit  error 
rates  of  Turbo  Coded  Modulation  (TuCM)  on  AWGN 
channels.  We  then  apply  these  bounds  to  calculate 
the  spectral  efficiency  of  adaptive  TuCM  on  flat  fad¬ 
ing  channels,  which  comes  within  4dB  of  the  fading 
channel  capacity  limit. 


I.  Transfer  Function  bounds 


Techniques  that  compute  bounds  on  the  bit  error  rate  of  coded 
systems  based  on  the  input-output  transfer  function  of  the 
state  diagram  describing  the  system  are  referred  to  as  trans¬ 
fer  function  bounds.  Previous  work  describes ‘such  bounds 
for  trellis  codes  and  turbo  codes1  with  binary  modulation  [1], 
We  extend  this  idea  and  apply  it  to  Turbo  Coded  Modula- 
tion(TuCM)  with  permuter  size  N,  where  the  output  bits  are 
mapped  to  a  higher  level  constellation.  Let  the  sets  Si  and 
S2  represent  all  possible  states  and  the  sets  Bi  and  B2  repre¬ 
sent  all  possible  edges  in  the  state  diagram  of  the  constituent 
encoders  1  and  2  of  the  turbo  code  under  consideration.  Let 
Ei  and  E2  denote  the  set  of  all  possible  error  states  and  Ci 
and  Ci  the  sets  of  all  possible  edges  in  the  error  state  dia¬ 
grams  of  the  constituent  encoders  1  and  2.  Then  we  define 
a  super-state  for  constituent  encoder  1  as  an  element  of  the 
set  Zi  =  {(si,ei)  :  si  €  Si.ei  G  £1}  and  a  super-edge  as 
belonging  to  the  set  Wi  =  {[61, ci]  :  61  €  Ri,Ci  G  Ci).  Simi¬ 
larly  super-state  set  Z 2  and  super-edge  set  W2  can  be  defined 
for  constituent  encoder  2.  We  now  define  a  combined-state 
as  one  that  belongs  to  the  set  {(zi,z2)  :  zi  €  Zi,z2  G  Z 2} 
and  a  combined-edge  as  belonging  to  the  set  {[u>i,ti;2]  :  Wi  G 
Wi ,  w2  G  W2}.  The  combined-states  and  combined-edges  in 
a  graphical  representation  form  a  combined  state  diagram.  In 
this  diagram,  each  combined-edge  [uq,itJ2]  has  a  label  of  the 
form  PJJX:EYyDtl2L,  where  I,  J,  X,  Y,  D,  L  are  dummy  vari¬ 
ables  which  carry  useful  information  in  their  exponents.  i,j 
equal  the  input  weights  of  the  constituent  encoders  1  and  2, 
corresponding  to  the  combined-edge  \wi,w2\.  Similarly,  x,  y 
represent  weights  of  error  patterns  and  d  the  Euclidean  dis¬ 
tance  between  correctly  and  incorrectly  decoded  codewords 
corresponding  to  [wi,W2]-  The  variable  L  is  present  on  each 
combined-edge  to  denote  a  transition.  This  combined-state 
diagram  can  be  treated  as  a  signal  flow  graph  with  the  labels 
on  the  edges  being  treated  as  gains,  and  its  transfer  func¬ 
tion  from  the  all-zero  state  back  to  the  all-zero  state  can 
be  obtained.  The  coefficient  of  EN  in  a  series  expansion  of 


this  transfer  function  can  be  written  as  T(I,J,X,Y,D)  = 
Ei <i,j,x,y<N  Ed  Qi,j,i,v.dItJ',X:rYyDd2  and  the  BER  can  be 

p-d*/4N0 

bounded  as  Pbit  <  Ei<i,3<iv  Ed  fr  - , where 


°This  work  was  supported  by  ONR  Young  Investigation  Award 
N000 14-99- 1-0578 

'which  typically  consist  of  two  constituent  encoders  in  parallel. 


No/2  is  the  power  spectral  density  of  the  noise.  This  ex¬ 
pression  can  be  simplified  if  the  outputs  of  the  two  con¬ 
stituent  encoders  of  the  TuCM  are  modulated  indepen¬ 
dently.  If  the  coefficient  of  L‘v  in  the  transfer  func¬ 
tion  of  super-state  diagram  of  the  yth  constituent  encoder2 
is  Tj(X, I,D)  =  Ei<i,x<NEdrU,*,i,dX*I<Dd\  then  the 
BER  of  the  net  TuCM  scheme  can  be  bounded  as  Pm  < 

Ei<i,i<jv  f((x) (T))  1  Edni.*.*.<in2,i,i,<ie  ^  °-  An  ex¬ 

ample  code  from  [2]  with  16-state  constituent  encoders,  N  = 
4096  and  8PSK  per  encoder  was  chosen,  with  feedback  poly¬ 
nomial  ho  =  23  and  hi  =  14,  h2  =  16,  A3  =  21, /14  =  31  as 
feedforward  polynomials  and  reordered  mapping.  The  bound 
matched  simulation  results  within  1.5  dB  at  high  SNR. 

II.  Average  bounds  and  Adaptive  Modulation 

Next,  we  propose  an  average  bound  for  Self  Concatenated 
Coded  Modulation(SCCM),  realizing  that  TuCM  are  special 
cases  of  these.  We  use  the  transfer  function  technique  de¬ 
scribed  in  Section  I,  but  this  time  average  over  all  possi¬ 
ble  rate  b/n  recursive  constituent  encoders  of  memory  k  to 
get  the  Error-path  Length  Generating  function(ELGF)  as 

A(M,L)  =  £1j«i.jM<L'l 

where  M  and  L  are  dummy  vari¬ 
ables  representing  error  weight  and  error  length  respectively. 
Using  the  definition  for  Ro  in  [3]  and  averaging  over  all  scram¬ 
blers3,  we  obtain  Pbit  <  l/N2-(N-k6)<9A(M,L)/<9M  after 
substituting  M  =  1  and  L  =  22nR°.  Considering  an  adap¬ 
tive  coded  modulation  system  with  model,  and  constraints  as 
in  [4],  we  use  the  average  bound  to  calculate  spectral  effi¬ 
ciency  with  16-state  constituent  codes  and  N  —  1024.  The 
results  obtained  show  a  2dB  gain  in  SNR  compared  to  trel¬ 
lis  coded  modulation  systems,  with  a  spectral  efficiency  that 
comes  within  4dB  of  channel  capacity. 
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Abstract  —  We  develop  new,  low-complexity  turbo  codes  suit¬ 
able  for  bandwidth  and  power  limited  systems.  These  codes  are 
constructed  as  an  extension  of  Repeat-Accumulate  codes  to  high- 
level  modulations.  Two  design  criteria  are  proposed,  based  on 
maximum-likelihood  decoding  and  on  Gaussian  density  evolution 
in  iterative  decoding. 

I.  Structure  of  SCTCM  with  Rate-1  Inner  Code 

Our  recent  results  on  concatenation  of  an  outer  code  with  a  sim¬ 
ple  accumulator  as  inner  code  for  binary  modulation  [2]  led  us  to 
develop  a  new  method  for  serial  concatenated  TCM  (SCTCM).  For 
MPSK,  or  a  two  dimensional  constellation  with  M  points,  let’s  define 
m  =  log2  M.  We  propose  a  novel  method  to  design  low-complexity 
serial  concatenated  TCM,  which  achieves  bm/(b  +  1)  bits  per  modu¬ 
lation  symbol,  using  an  outer  rate  b/(b+ 1)  binary  convolutional  code 
(or  a  short  block  code)  with  maximum  free  Hamming  distance.  An 
interleaver  n  permutes  the  output  of  the  outer  code.  The  interleaved 
data  enters  a  rate  m/m  =  1  recursive  convolutional  inner  encoder. 
The  m  output  bits  are  then  mapped  to  one  symbol  belonging  to  a  2m- 
level  modulation. 

The  inner  code  and  the  mapping  are  jointly  optimized.  For  short 
blocks  we  use  the  ML  criterion  based  on  maximizing  the  effective 
free  Euclidean  distance  of  the  inner  TCM  (see  [1]  for  more  de¬ 
tail  on  ML  design  criteria).  For  large  block  sizes  we  use  a  new 
minimum-threshold  criterion  for  iterative  decoding  to  be  discussed 
shortly.  Considering  8PSK  (m  —  3)  modulation  as  an  example,  then 
the  throughput  r  =  3 b/(b  +  1)  is  as  follows:  for  b  —  2,  r  =  2;  for 
b  =  3,  r  =  2.25;  and  for  b  =  4,  r  =  2.4.  This  suggest  that  we  can 
use  a  rate  1/2  convolutional  code  with  puncturing  to  obtain  various 
throughputs  without  changing  the  inner  code  or  modulation. 

For  rectangular  AJ2-QAM,  where  m  =  log2  M,  the  stmeture  be¬ 
comes  even  simpler.  In  this  case,  to  achieve  throughput  of  2mb/(b  + 
1)  bits/symbol  we  need  a  rate  b/(b  +  1)  outer  code  and  a  rate  m/m 
inner  code,  where  the  m  output  bits  are  alternatively  assigned  to 
in-phase  and  quadrature  components  of  the  A/2-QAM  modulation. 
For  example  consider  16-QAM  modulation,  where  m  =  2,  then  the 
throughput  r  —  4 b/(b  +  1)  is:  for  b  —  1 ,  r  =  2;  for  b  =  2,  r  —  2.67; 
for  b  =  3,  r  =  3;  and  for  b  =  4,  r  =  3.2. 

Here  we  only  discuss  the  example  of  16QAM  modulation,  and 
r  =  3  which  implies  b  =  3.  The  encoder  stmeture  of  SCTCM  for 
4-state  inner  TCM  and  4-state  outer  is  shown  in  Fig.  1  as  an  example. 

II.  Iterative  Decoding  Design  Criteria 

The  design  criterion  is  based  on  the  method  of  density  evolution  pro¬ 
posed  by  Richardson  and  Urbanke  [3].  It  has  been  observed  by  many 
researchers  that  the  extrinsic  information  in  iterative  decoding  can  be 
approximated  by  a  Gaussian  density  function.  El  Gamal  [4]  consid¬ 
ered  the  soft-input  soft-output  APP  module  in  turbo  decoders  as  a 
signal-to-noise  ratio  (SNR)  transformer.  Using  these  ideas,  and  the 

*The  work  described  was  funded  by  the  TMOD  Technology  Program  and 
performed  at  the  Jet  Propulsion  Laboratory,  California  Institute  of  Technology 
under  contract  with  the  National  Aeronautics  and  Space  Administration. 


method  for  analyzing  turbo  codes  suggested  by  El  Gamal  [4],  we  ex¬ 
tended  the  results  to  analyze  concatenated  TCM  by  approximating 
the  density  functions  for  extrinsics  as  Gaussian  densities,  and  then 
computing  the  mean  and  variance  in  the  Gaussian  density  evolution. 
Since  the  probability  of  incorrect  decoding  is  given  by  g(VSNR), 
where  SNR  =  mean2/variance,  we  only  need  to  track  the  SNR.  This 
will  result  in  a  slightly  pessimistic  threshold  since  the  Gaussian  den¬ 
sity  has  the  highest  entropy  for  a  given  variance.  Slightly  optimistic 
threshold  results  are  obtained  if  we  impose  density  consistency  as 
proposed  by  Richardson  et  al,  which  suggests  that  we  only  need  to 
compute  the  mean  (SNR=mean/2).  At  each  iteration,  we  computed 
SNRs  (averaged  over  all  transmitted  patterns),  and  collected  them  for 
the  outer  and  the  inner  codes.  We  used  the  example  of  4-state  outer 
with  puncturing  pattern  100100 . . .  and  4-state  rate-1  inner  as  shown 
in  Fig.  1.  The  output-input  SNR  for  the  inner  code  and  the  input- 
output  SNR  for  the  outer  code  are  shown  in  Fig.  1.  Iterations  for 
Et,/N0-5.5  dB  are  also  shown  in  the  Fig.  1.  If  the  two  curves  do 
not  cross,  then  the  iterative  decoder  converges.  Note  that  we  used  all 
assumptions  made  by  Richardson  and  Urbanke  for  very  large  block 
sizes  and  the  concentration  theorem.  In  Fig.  1  we  see  that  if  Eb/N0 
is  greater  than  4.8  dB,  the  iterative  decoder  converges,  where  the  ca¬ 
pacity  limit  is  4.54  dB.  This  method  was  used  to  select  the  2-state  and 
the  4-state  inner  TCM  codes.  The  performance  of  iterative  decoding 
of  this  serial  TCM  with  16QAM  for  input  block  size  of  12288  bits 
and  8  iterations  required  Eb/N0  =  6  dB  at  BER=4  x  10~8. 


SNR|„ 


Figure  I:  Graphical  analysis  of  iterative  decoding  threshold 
(16QAM,  puncturing  pattern  100,  r= 3). 
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Abstract  —  In  this  paper,  we  construct  quasi-cyclic 
Goppa  (or  related)  codes.  Some  of  these  codes  reach 
the  parameters  of  best  known  codes.  Generally,  there 
is  no  known  quasi-cyclic  code  for  these  lengths  and 
dimensions. 

I.  Introduction 

Alternant  codes  are  subfield  subcodes  of  Generalized  Reed- 
Solomon  codes  (GRS  code).  Goppa  codes  are  particular  case 
of  Alternant  codes.  In  [2],  we  proved  that  parity-check  sub¬ 
codes  of  Goppa  codes  and  extensions  of  Goppa  codes  are  also 
Alternant  codes.  In  [3],  A.  Diir  characterized  of  GRS  codes. 
Some  semi-monomial  automorphisms  of  GRS  codes  that  are 
not  permutations  of  the  support  induce  a  permutation  of  the 
subfield-subcodes  [1,  2].  We  use  this  method  for  constructing 
Goppa  codes  invariant  under  some  prescribed  permutations. 

II.  Classical  Goppa  codes 

Let  K  be  the  finite  field  GF(pm)  and  K  =  K  U  {oo}.  Let 
C  =  (a?o,  -. . ,  ofn-i )  be  an  n-tuple  of  distinct  elements  of  K. 
It  will  be  the  support  of  the  codes.  Let  v  =  (t>o,  •  •  .',  t>„_i)  be 
an  n-tuple  of  non-zero  elements  of  K .  For  s  =  0, . . . ,  n,  let 
0»,v,c  be  the  n-tuple  6s,v,c  =(no«or--,',n-ia^-i)  €  A'n. 

Definition  1  Let  k  be  an  integer  less  than  n.  The  Alternant 
code  Ak (v,  C)  is  the  code  of  length  n  over  GF(p)  with  parity- 
check  matrix  Mk{y ,  C)  whose  rows  are  0s,v,c  for  s  =  0, . . . ,  k  — 
1. 

Let  g(x)  £  F[x]  be  a  polynomial  of  degree  r  <  n  such  that 
g(oti)  ^  0  for  i  =  0, . . . ,  n  —  1.  The  Goppa  code  Q(g,  C)  with 
Goppa  polynomial  g(x )  and  support  C  is  the  Alternant  code 
Ar(vg,c,C)  with  vgX  =  (g(a0)~1,g(al)~1,...,g(a„-1)~1). 

Definition  2  The  parity-check  subcode  C  of  C  is  the  subset 
of  elements  x  =  (xo,  •  •  • ,  a:n-i)  €  C  satisfying  the  parity-check 
control  xo  +  xi  +  . . .  +  xn-i  =  0. 

The  extension  C  of  C  is  obtained  by  adding  a  parity-check 
control  symbol  xn  —  —  (xo  +  xi  +  . . .  +  xn-i)  to  the  codewords 
ofC. 

III.  Main  results 

We  use  oo  for  the  support  of  parity-check  control  symbol 
of  the  extension  of  a  Goppa  code. 

Let  /  be  an  element  of  the  projective  semi-linear  group 
PTL{2,  K).  f  can  be  considered  as  a  permutation  of  K: 

/(C)  =  {<*(q  +  b)/(c(q  +  d),ad-bc^0,q  =  p’.  _ 

Let  Cf  be  a  union  of  orbits  of  elements  of  K  under  /. 
Clearly,  /  induces  a  permutation  of  the  support  Cf. 

1  Associated  Searcher,  projet  CODES,  INRIA-Rocquencourt, 
78153F  LE  CHESNAY,  FRANCE 


Theorem  1  Let  g(x)  =  9'x>  be  a  Goppa  polynomial  of 

degree  r  <  n. 

1)  Let  f  be  an  element  of  AT L(l,K)  (i.e.  f(()  =  aCq  +  b). 
If  g  satisfies  g(axq  +  6)  =  ar glT~q g(x)q ,  the  Goppa  code  C  = 
Q{g,Cf)  is  invariant  under  f . 

2)  Let  f  be  an  element  of  PT  L(2 ,  K),  oo  ^  Cf.  Ifg  satisfies 
g(a)  ^  0  and  ]T[=0  gi(axq  +  b)‘(cxq  +  d)r~'  =  g(a)gTq g{x)q , 
then  the  parity-check  subcode  C  of  the  Goppa  code  C  = 
Q(g,Cf)  is  invariant  under  f . 

3)  Suppose  that  Cf  contains  oo  and  C  is  Cf  without  oo. 
Let  f  be  an  element  of  PTL{2,K).  If  g  satisfies  g(a)  /  0 
and  ]TT_o  9i{axV  +  b)‘(cxq  +  d)r~‘  =  g(a)gTqg(x)q ,  then  the 
extension  C  of  the  Goppa  code  C  =  Q(g,  C)  is  invariant  under 
f. 

IV.  Application  to  the  construction  of 
quasi-cyclic  Goppa  codes 

In  [2],  we  gave  an  algorithm  for  computing  the  polynomials 
g  described  in  Theorem  1. 

Choosing  for  support  C  some  union  of  orbits  of  same  length, 
this  gives  us  quasi-cyclic  codes. 

We  give  now  some  non-exhaustive  examples  of  parameters. 
All  these  codes  meet  the  bound  of  best  known  codes.  The 
order  of  quasi-cyclicity  (i.e.  the  order  of  the  quasi-cyclic  per¬ 
mutation)  is  given  in  index. 

Goppa  codes: 

[84,  70,  5] i4 ,  [63, 51,  5]9,  [63,39,  9]s,  [60,48,  5]i2,  [60,36,  9]12, 
[52,40,  5]13,  [45, 27,  8]g,  [36, 18, 8]s . 

Parity-check  subcodes  of  Goppa  codes 
[98,83,  6]i4,  [84,  55, 10]  14,  [84,  69,  6]i4,  [70,41,  10]i4, 

[56. 41. 6] i4 ,  [36, 18,8]iS. 

Extended  Goppa  codes: 

[84,55, 10]i4,  [84,  3, 48] h,  [70,  41, 10]14,  [63,  38, 10]9, 

[54. 41. 6] is ,  [54,35, 8]18. 

These  codes  are  obtained  for  K  =  GF(  128)  or  K  =  GF{ 64) 
using  MAGMA. 
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Abstract  — 

We  use  Grobner  bases  of  modules  to  construct  and 
classify  quasiscyclic  codes.  Whereas  previous  stud¬ 
ies  have  been  mainly  concerned  with  the  1-generator 
case,  our  results  elucidate  the  structure  of  arbitrary 
quasicyclic  codes  and  their  duals.  We  include  a  com¬ 
plete  characterisation  of  selfdual  quasicyclic  codes  of 
index  2. 

I.  Introduction  ■ 

The  theory  of  Grobner  bases  of  modules  has  been  applied 
[F95,  F96,  F97]  to  decoding  Reed-Solomon  codes,  to  scalar 
rational  interpolation,  and  to  various  other  problems,  such  as 
Pade  approximation,  that  can  be  represented  as  solving  sys¬ 
tems  of  polynomial  congruences.  The  structure  of  quasicyclic 
codes  was  explored  by  Seguin  and  others  [CS,  SD,  SH].  We 
adopt  a  new  approach  based  on  the  construction  of  a  canonical 
Grobner  basis  generating  set  for  a  quasicyclic  code  regarded 
as  a  submodule  of  Rl  where  R  —  F\x\/{xm  —  1). 

NB:  Throughout  the  paper  the  word  “code"  means  “quasicyclic 
code”. 

II.  Basic  structure 

Let  C  be  a  code  of  length  Cm  and  index  t  over  F,  where  t 
the  smallest  power  of  the  cyclic  shift  operator  under  which 
C  is  invariant.  By  a  coordinate  permutation  we  obtain  the 
polynomial  representation  of  C  as  an  i?-submodule  of  Re .  The 
code  C  is  the  image  of  an  F[x]-submodule  C  of  F[x)1  containing 
1C  =  (( xm  —  l)e,,  i  =  1, . . . ,  (.)  (where  d  is  the  standard  basis 
vector)  under  the  natural  homomorphism  ip  :  (ai,...,a<)  i— > 
(ai  +  (xm  —  1 ),...,  a*  +  ( xm  —  1)).  We  use  position-over-term 
(POT)  order  in  F[a:]€,  with  e,  >  ej  for  i  <  j. 

Theorem  1  The  reduced  Grobner  basis  ofC  is 

Q  =  {ffi  =  (0,  •  •  =  !,•••  J} 

where 

i.  gu  is  monic  and  dgki  <  dgu  for  k  <  i 

ii.  gu  divides  xm  —  1 

iii.  if  g a  =  xm  -  1  then  —  ( xm  -  l)ei. 

The  F -dimension  of  F\x\l /C  is  (  dgu.  If  G  is  the  poly¬ 
nomial  matrix  with  rows  qi  then  there  is  a  matrix  A  satisfying 
AG  =  GA  =  (xm  -  1)1. 

Thus  C  has  an  -Regenerating  set  Q  comprising  the  elements 
of  a  Grobner  basis  Q  not  mapped  to  zero  under  ip.  We  refer 
to  this  set  of  generators  as  a  GB  generating  set  of  C  (or  RGB 
generating  set  as  appropriate). 

Corollary  2  The  dimension  of  the  code  C  with  GB  generating 
set  {g>(gi),i  =  1, . . . , €}  dgu  =  £'=1(m -<%/«). 

This  makes  it  straightforward  to  enumerate  the  possible  di¬ 
mensions  of  codes,  and  thus,  in  principle,  construct  all  possible 
codes  of  a  given  index. 


Patrick  Fitzpatrick 
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The  set  of  vectors  in  Ftm  defined  by  {xSi  gi  :  i  — 
1  Si  —  0, . . . ,  m  —  dgit  —  1 }  is  a  basis  of  C.  These  form 

the  rows  of  a  block  upper  triangular  generator  matrix  of  C. 
Using  the  matrix  A  introduced  in  Theorem  1  we  can  derive  a 
Grobner  basis  representation  and  generator  matrix  of  C±  . 

III.  Self-dual  codes  of  index  2 

We  write  xm  —  1  =  Flne/v  over  where  e  —  m/char  F, 
and  divide  the  irreducible  factors  fn,n  £  N  into  two  types 
according  to  whether  or  not  /*  ~  fn  (where  u*  denotes  the 
reciprocal  of  u,  and  ~  means  “is  a  constant  multiple  of”).  Let 
I  C  N  be  the  set  of  indices  of  factors  having  this  property. 
The  others  then  fall  into  reciprocal  pairs.  Let  J  C  N  be  a  set 
of  indices  comprising  one  element  of  each  of  these  pairs  and 
define  7T  :  J  — ►  N\(I  U  J)  by  fj*  =  Then  xm  -  1  ~ 

Y\fl  n/j  FI where  f*  ~  /i.  fj*  ~  fnUbfnU)*  ~  fi 

and  we  note  that  dfn(n)  =  dfrl.  Denote  the  monic  factor 
c  Tl  f i  '  II  fj3  w^ere  C  is  an  appropriate  constant, 

by  \otiy  otj ,  q,,  ( j )  ] . 

Theorem  3  The  code  C  of  index.  2  is  selfdual  if  and  only  if 
each  minimal  Grobner  basis  of  C  has  a  generator  matrix 

(  K,Qj,Q„(j)]  v[ Pi,PjP„U)\  A 

V  0  l7i.7i>7„(j)l  J 

where 

i.  2 Qi  <  e,  Qj  +  olt  (,)  <  e 

ii.  2a i  <  e  +  Pi  -  7i,  aj  +  an  (j)  <  e  +  0,  -  ^ ,  aj  +  an  {j)  < 

£  +  PnU)  ~  7„(j) 

iii.  a,  +  7i  =  e, a3  +  q„0)  +  y,-  +  %u)  =  2 e 

,iv.  w’[2 Pi  -  2ai,(3j  +  pn{j)  -aj  -  avU),Pj  +  0AJ)  -  a,  - 
Q„0)!  =  1  mod  l27i 7j  +7„  (j)  -F +7„  (j)  where  v1  = 
—xrv,  r  =  ^  dfi{ai  -  ft)  +  dfjfa  +  anU)  -  ft  -  ft0)). 
In  the  special  case  (7< ,  Tj ,  7„  ( j ) ]  =  xm  -  1  the  RGB  generating 
set  of  C  is  { 1  v )  where  vv  =  —  1  mod  xm  —  1  and  dv  <  m. 
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Abstract —  Twelve  new  binary  quasi-cyclic(QC)  codes, 
which  improve  the  lower  bounds  on  minimum  distances  for 
binary  linear  codes,  are  presented,  and  a  web  database  on 
best-known  binary  QC  codes  is  constructed  for  public  access. 


I.  INTRODUCTION 

Quasi-cyclic  (QC)  codes  are  a  generalization  of  cyclic 
codes  whereby  a  cyclic  shift  of  a  codeword  by  p  positions 
is  still  a  codeword.  Cyclic  codes  are  a  special  case  of  QC 
codes  with  p  =  1.  It  has  been  known  that  QC  codes 
contain  many  of  the  best-known  linear  codes[2,3], 

Circulant  matrices  are  building  blocks  in  the  generator 
matrix  of  a  QC  codes.  A  circulant  matrix  can  be  specified 
by  a  polynomial  with  the  first  row  as  the  coefficients.  If  m 
be  the  dimension  of  the  matrix,  then  the  block  length  for 
tire  QC  code  is  n  =  mp,  where  p  is  the  number  of  circulants 
for  the  code. 

Let  g0(x),  gi(x),  ...,  gp-i(x)  be  p  generator  polynomials 
for  the  QC  [mp,  k]  code.  Then  its  generator  matrix  can  be 
defined  by 

G  =  ( go(x),  gi(x),  ...,  gp.i(x)  )  (1) 

Let 

h(x)  =  (xm  -  1)  /  gcd{  xra  -  1,  go(x),  gi(x),  ...,  gp-i(x)  } 

(2) 

Then  k,  the  dimension  of  the  QC  code,  is  equal  to  the 
degree  of  h(x).  In  this  paper,  only  binary  codes  are 
discussed. 

II.  NEW  BINARY  QUASI-CYCLIC  CODES 

Computer  search  for  good  QC  codes  have  been  proved 
to  be  a  good  method  and  lots  of  QC  codes  improving 
lower  bounds  on  minimum  distance  have  been  found[3]. 

The  technique  used  in  the  this  paper  was  presented  first 
in  [1],  Some  refinements  to  reduce  the  complexities  are 
introduced,  and  special  search  interests  are  paid  to  the  case 
with  m  >  32. 

With  this  approach,  twelve  new  good  QC  codes  which 
improve  the  lower  bounds  on  the  minimum  distance  [2] 
have  been  constructed  and  many  other  QC  codes  which  are 
better  than  previously  known  QC  codes  or  as  better  as  the 
best-known  codes  are  obtained[3].  Table  1  shows  the 
parameters  of  twelve  new  QC  codes.  The  column  lb  -  ub 
gives  the  previously  known  lower  and  upper  bounds  on 
the  minimum  distance  of  the  binary  linear  codes  from  the 


database  maintained  by  Professor  Brouwer[2],  The  author 
[3]  maintains  a  web  database  of  binary  QC  codes 
(including  weight  distributions).  This  database  is 
searchable  by  block  length  n,  code  dimension  k,  circulant 
matrix  size  m,  parameter  p,  and  contributor,  or  any 
combination  of  them.  For  the  sake  of  space,  the  generator 
polynomials  are  omitted  in  the  paper  and  they  can  be 
found  in  the  database. 


TABLE  1  NEW  QC  CODES  THAT  IMPROVE  THE 
LOWER  BOUNDS  ON  MINIMUM  DISTANCE  OF  A 
BINARY  LINEAR  CODE 


QC  Code 

P 

m 

d 

lb-ub 

[112,  13] 

4 

28 

48 

46-50 

[99,  20] 

3 

33 

34 

33-39 

[102,  17] 

3 

34 

38 

37-42 

[164,  20] 

4 

41 

62 

61-72 

[225,  19] 

5 

45 

90 

89-102 

[153,  16] 

3 

51 

64 

62-68 

[153,  18] 

3 

51 

62 

59-66 

[204,  18] 

4 

51 

82 

80-92 

[165,  20] 

3 

55 

64 

62-72 

[165,  21] 

3 

55 

62 

61-72 

[220,  20] 

4 

55 

88 

86-97 

[220,  21] 

4 

55 

86 

85-96 

In  [4],  a  binary  QC  [102,  17]  code  with  d  =  37,  m  =  17 
and  p  =  6  was  found.  As  shown  in  the  Table  1,  a  binary 
QC  [102,  17]  code  with  d  =  38,  m  =  34,  p  =  3  and 
generator  polynomials  607703,  11774425325, 

4411577731  existed. 
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Abstract,  —  A  sequence  of  q- ary  cyclic  codes  is  con¬ 
sidered.  For  each  finite  field  GF(q),  q  >  4,  there  is  a 
code  with  parameters  [n,  k,  d ;  <j]  =  [q(q  —  1)  +  1,  q(q  —  1)  — 
6, 6;  qr] .  We  show  that  all  these  codes  are  n-,  k-  and 
d-optimal,  with  only  one  exception.  Also  the  dual 
codes  are  considered.  Their  true  minimum  distances 
are  calculated  in  the  range  4  <  q  <  29. 

I.  Introduction 

Standard  terminology  from  coding  theory  is  used.  Cyclic 
codes  are  identified  with  ideals  in  the  ring  GF(q)[x]/(x*  —  1). 
The  set  /  consisting  of  all  roots  of  the  generator  polynomial 
g(x)  of  a  given  code  is  referred  to  as  a  defining  set.  The 
minimal  polynomial  of  a1,  where  a  is  a  primitive  n-th  root  of 
unity  in  an  extension  field  of  GF(q),  will  be  denoted  m,(x). 
Detailed  proofs  of  the  given  statements  can  be  found  in  [6]. 

II.  The  codes 

Let  q  be  a  power  of  a  prime.  Denote  by  Cq  the  cyclic 
code  of  length  n  —  q(q  —  1)  +  1  over  GF(q)  with  genera¬ 
tor  polynomial  g(x)  =  mo(x)m\(x).  The  codes  C-2  and  C3 
are  trivial,  consisting  of  the  all-zero  codeword  only.  Since 
n|(q6  —  1),  then  a  £  GF(q6)  and  the  defining  set  /  of  Cq 
equals  Thus  for  q  >  4 

the  codes  Cq  are  q- ary  BCH  codes  [1,  2]  of  dimension  k  = 
q(q  —  1)  —  6  and  designed  minimum  distance  4.  But  the  true 
minimum  distance  is  actually  6  for  all  prime  powers  q  >  4. 

Theorem  1  For  every  prime  power  q  >  4  the  code  Cq  has 
minimum  distance  six,  i.e.  Cq  has  parameters  [q(q  —  1)  + 
1,9(9  -  1)  -  6,6;g]. 

In  the  proof  resuts  of  Roos  [3],  Van  Lint  and  Wilson  [4, 
p. 28]  and  sphere  packing  arguments  are  used. 

Define  Dq(n,k)  and  Kq(n,d)  to  be  the  maximal  value  of 
d  and  k,  respectively,  for  which  an  [n,k,d;q]  code  exists. 
Furthermore,  let  Nq(k,d)  denote  the  minimal  value  of  n  for 
which  an  [n,k,d\q\  code  exists.  A  code  is  said  to  be  d-,  k- 
or  n-optimal  if,  respectively,  d  =  Dq(n,k),  k  =  Kq(n,d)  or 
n  —  Nq(k,d).  The  following  statement  shows  that  the  codes 
Cq,  9  >  4,  are  d-,  k-  and  n-optimal  with  only  one  exception. 

Theorem  2  The  following  equalities  hold. 

(1)  Dq  (q(q  -  1)  +  1  ,q(q  -  1)  -  6)  =  6  for  q  >  4; 

(ii)  Kq(q(q  -  1)  +  1,  6)  =  q{q  -  1)  -  6  for  q  >  4; 

(Hi)  Nq (q(q  -  1)  -  6,  6)  =  q(q  -  1)  +  1  for  q  >  5. 
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Nevertheless,  for  fixed  minimum  distance  d  and  redundancy 
r  =  n—k,  Cq  do  not  have  maximal  information  rate  R  =  k/n  = 
1  —  jj-  for  all  q  >  4.  It  is  known  that  there  exist  [20, 13,  6;  4]  and 
[28,  21, 6;  5]  codes  providing  two  (the  only  known)  examples 
(see  [5,  p.  420,435])  of  this  fact. 

III.  The  dual  codes 

Let  Cq  denote  the  [q(q  —  1)  +  1,7 ,dL\q]  dual  code  of 
Cq.  Then  Cq  has  defining  set  {a2,  a3, . . . ,  aq~2 ,  a,+1 ,  aq+2, 

. . . ,  aq2~2q ,  aq2~2q+3,Qq2~2q+'>, . . .  ,aq2-q~'}.  Thus  they  are 
BCH  codes  with  designed  minimum  distance  djj cll  :=  q2  — 
3 q  +  1.  Using  the  computer  software  MAGMA  the  true  mini¬ 
mum  distance  dx  of  Cq  has  been  calculated  for  q  <  32.  The 
result  is  presented  in  Table  1  together  with  dficll . 


Table  1:  True  minimum  distance  of  6’^ 
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We  note  that  for  q  —  7,8  and  9  the  entries  in  our  table 
improve  the  corresponding  lower  bounds  given  in  [5,  pp.  441- 
447], 
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Abstract  —  We  design  sequences  of  low-density  par¬ 
ity  check  codes  that  provably  perform  at  rates  ex¬ 
tremely  close  to  the  Shannon  capacity.  These  codes 
are  built  from  highly  irregular  bipartite  graphs  with 
carefully  chosen  degree  patterns  on  both  sides.  We 
further  show  that  under  suitable  conditions  the  mes¬ 
sage  densities  fulfill  a  certain  symmetry  condition 
which  we  call  the  consistency  condition  and  we  present 
a  stability  condition  which  is  the  most  powerful  tool 
to  date  to  bound/determine  the  threshold  of  a  given 
family  of  low-density  parity  check  codes. 

I.  Introduction 

In  this  paper  we  present  irregular  low-density  parity  check 
(LDPC)  [1,4]  codes  which  exhibit  performance  extremely  close 
to  the  best  possible  as  determined  by  the  Shannon  capac¬ 
ity  formula.  These  codes  are  characterized  by  their  degree 
sequence  pair  (A(z),p(z))  [2]  and  a  random  choice  of  the 
connections.  For  the  additive  white  Gaussian  noise  channel 
(AWGNC)  the  best  code  of  rate  one-half  presented  in  this 
paper  has  a  threshold  within  0.06dB  from  capacity,  and  sim¬ 
ulation  results  show  that  our  best  LDPC  code  of  length  106 
achieves  a  bit  error  probability  of  10-6  less  than  0.13dB  away 
from  capacity,  beating  even  the  best  (turbo)  codes  known  so 
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Figure  1:  Comparison  of  (3,6)-regular  LDPC  code,  turbo  code, 
and  optimized  irregular  LDPC  code.  All  codes  are  of  length  10s 
and  of  rate  one-half.  The  bit  error  rate  for  the  AWGNC  is  shown 
as  a  function  of  E^/No  (in  dB),  the  standard  deviation  a,  as  well 
as  the  raw  input  bit  error  probability  Pfc . 

II.  Analytic  Properties  of  Density  Evolution 

Assume  we  employ  a  message  passing  decoder  on  an  infinitely 
long  LDPC  code.  Let  Pi  denote  the  distribution  of  messages 
emitted  from  the  variable  nodes  at  the  f-th  iteration  assuming 
that  the  all-one  codeword  was  transmitted.  The  sequence  of 
distributions  Pi  and  their  determination  is  collectively  referred 
to  as  density  evolution  [2]. 


In  the  following,  we  call  a  distribution  /  on  R  consistent 
if  it  satisfies  f(x )  —  f(—x)ex  for  all  x  £  R+.  For  example,  a 
Gaussian  density  is  consistent  iff  its  mean  p  and  variance  cr2 
are  related  by  a2  =  2p.  The  following  theorem  can  often  be 
used  to  achieve  significant  speed-ups  and  improved  accuracy 
in  the  determination  of  these  message  distributions. 

Theorem  1  Suppose  that  a  binary-input  channel  has  symme¬ 
try  property  p(y  |  z  =  1)  =  p(—y  |  x  =  —1).  Under  the  all-one 
codeword  assumption  let  Pi  denote  the  message  distribution 
of  a  belief-propagation  decoder  at  the  £-th  iteration,  where  all 
messages  are  assumed  to  be  in  log-likelihood  ratio  form.  Then 
Pi  is  consistent. 

Assume  that  after  some  iterations  the  number  of  remain¬ 
ing  errors  is  fairly  small.  Will  the  number  of  errors  converge 
to  zero  if  we  proceed  with  further  iteration  rounds  or  will  it 
stay  bounded  away  from  zero  regardless  of  the  number  of  it¬ 
erations?  This  is  answered  in 

Theorem  2  Let  g(s)  be  the  moment  generating  function  cor¬ 
responding  to  the  initial  message  distribution  Po{x),  i.e., 
g(s)  =  Ep0[esX],  and  assume  that  g(s)  <  oo  for  all  s  in 
some  neighborhood  of  zero.  Define  r  =  —  log  (inf  „  <o  g(s)) 
which  for  consistent  initial  message  distributions  Pq  simplifies 
to  r  —  -  log  (2/0°°  Po(x)  e~x /2  dx).  If  A'(0)p'(l)  >  er,  then 
the  probability  of  error  of  density  evolution  is  strictly  bounded 
away  from  0.  Conversely,  i/A,(0)p,(l)  <  e.r ,  then  there  exists 
e  >  0  such  that  if  density  evolution  is  initialized  with  a  consis¬ 
tent  message  distribution  P  satisfying  Pr,rr(P)  <  e,  then  the 
probability  of  error  will  converge  to  zero  under  density  evolu¬ 
tion. 


For  the  binary  erasure  channel,  the  binary  symmetric  channel 
and  the  additive  white  Gaussian  noise  channel  we  have  er  = 


= ,  and  er  =  e  i*2  ,  respectively. 


III.  Optimization 

By  optimizing  the  degree  sequence  pair  (A(x),p(r))  we  have 
found  ensembles  of  irregular  LDPC  codes  with  thresholds  ex¬ 
tremely  close  to  capacity  for  a  wide  range  of  rates  and  channels 
[3], 
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I.  Introduction 

LDPC  codes  [1]  with  iterative  decoding  based  on  belief- 
propagation  (IDBP)  have  been  shown  to  achieve  astonishing 
error  performance  [2],  But  no  algebraic  or  geometric  method 
has  been  found  for  constructing  these  codes.  Codes  that  have 
been  found  are  largely  computer  generated,  especially  long 
codes.  In  this  paper,  we  present  two  classes  of  high  rate 
LDPC  codes  whose  constructions  are  based  on  the  lines  of 
two-dimensional  finite  Euclidean  and  projective  geometries, 
respectively. 

II.  Codes  Constructed  Based  on 
Two-dimensional  Finite  Geometries 

Regard  the  Galois  field  GF(22s)  as  the  two-dimensional  Eu¬ 
clidean  geometry  EG(2,2S)  over  GF(2S)  [3].  Let  a  be  a  primi¬ 
tive  element  of  GF(22s).  Then  a°°=0,ao  =  1,  a1 ,  a2, a 223-2 
form  all  the  points  of  EG(2,2S).  The  zero  element  0  is  called 
the  origin  of  EG(2,2S).  Every  line  in  EG(2,2S)  consists  of  2s 
points.  For  a  given  point  a’  in  EG(2,2S),  there  are  2s  + 1  lines 
intersect  at  a'.  Let  v  =  (vo,vi, . . .  ,u22._2)  be  a  (22s -l)-tuple 
over  GF(2).  Number  the  components  of  v  with  the  nonzero 
elements  of  GF(22s)  as  follows:  the  component  u;  is  numbered 
a1  for  0  <  i  <  22s  —  2.  Hence,  a1  is  the  location  number  of 
Vi.  Let  C  be  a  line  in  EG(2,2S)  that  does  not  pass  through 
the  origin  a°°.  Based  on  C,  form  a  binary  (22s  -  l)-tuple  as 
follows:  V£  =  (vo,vi,. . .  ,v22 s_2)  whose  z-th  component  v,  is 
1  if  and  only  if  its  location  number  a1  is  a  point  on  C.  This 
vector  V£  is  called  the  incidence  vector  of  line  £.  Now  form  a 
(22s  —  1)  x  (22s  —  1)  matrix  H  with  V£  and  its  22s  —  2  cyclic 
shifts  as  rows.  The  rows  of  H  are  the  incidence  vectors  of  the 
22s  —  1  distinct  lines  in  EG (2,2s)  which  do  not  pass  the  origin, 
and  the  columns  of  H  correspond  to  the  22s  —  1  non-origin 
points  of  EG(2,  2s).  The  ratio  of  the  total  number  of  ones  to 
the  total  number  of  entries  in  H  matrix,  called  the  density, 
is  r  —  2s/ (22s  -  1).  Let  C  be  the  null  space  of  H.  Then  C 
is  a  LDPC  code  of  length  n  —  22s  —  1.  It  is  cyclic  and  its 
generator  polynomial  is  completely  characterized  by  its  roots 
in  GF(22s).  It  has  n  —  k  —  3s  —  1  parity  check  bits  and  a 
minimum  distance  dm;n  =  2s  +  1. 

Similarly  LDPC  codes  can  also  be  constructed  based  on 
the  lines  of  the  two-dimensional  projective  geometry  PG(2,2S). 
This  construction  results  in  a  class  of  PG-LDPC  codes  which 
are  also  cyclic. 

Since  both  EG-  and  PG-LDPC  codes  are  cyclic,  their  en¬ 
coding  is  extremely  simple.  This  is  a  contrast  to  the  complex 
encoding  of  long  computer  generated  LDPC  codes.  For  iter¬ 
ative  decoding  of  these  cyclic  LDPC  codes,  error  detection  at 
the  end  of  each  decoding  iteration  can  also  be  achieved  easily 
with  a  simple  shift  register. 
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III.  Extension  and  Puncturing 
A  2-dimensional  EG-  or  PG-LDPC  code  can  be  extended 
by  splitting  each  column  h  of  its  parity-check  matrix  H  into  q 
columns,  hi ,  I12, ...,  h?,  with  the  “ones”  of  h  distributed  among 
hi ,  I12,  ■  •  • ,  h,  (evenly  or  not  evenly).  This  results  in  a  low- 
density  matrix  Hcl(  with  q(22s  -  1)  columns  and  density 
r  =  2s /(q(22s  -  1)).  The  null  space  C ext  of  Hel<  is  also  a 
LDPC  code  and  is  quasi-cyclic.  Finite  geometry  LDPC  codes 
can  also  be  punctured  in  various  ways  to  obtain  good  LDPC 
codes.  We  can  remove  columns  of  the  parity-check  matrix  H 
correspond  to  the  points  on  a  line  or  a  set  of  lines.  Puncturing 
can  also  be  achieved  with  combination  of  removing  columns 
and  rows  of  H. 

IV.  Error  Performance 
EG-  and  PG-LDPC  codes  and  their  extended  codes  with 
IDBP  achieve  very  good  performance.  As  an  example,  let 
m  —  2  and  s  =  6.  There  exists  a  (4095,3367)  EG-LDPC  code. 
The  error  performance  of  this  code  with  IDBP  is  shown  in 
Figure  l-(a).  Suppose  we  split  each  column  of  the  parity  check 
matrix  of  this  code  into  16  columns.  This  column  splitting 
results  in  a  (65520,61425)  extended  EG-LDPC  code.  Using 
IDBP,  this  code  achieves  an  error  performance  only  0.3dB 
away  from  Shannon  limit,  shown  in  Figure  l-(b). 
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Figure  1:  (a). Bit-  and  frame-error  probabilities  of  the 
(4095,3367)  EG-LDPC  code.  (b).Bit-  and  frame-error 
probabilities  of  the  (65520,61425)  Extended  EG-LDPC 
code. 
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Abstract  —  A  statistical  analysis  of  low-density  con¬ 
volutional  (LDC)  codes  is  performed.  This  analysis 
is  based  on  the  consideration  of  a  special  statistical 
ensemble  of  Markov  scramblers  and  the  solution  to  a 
system  of  recurrent  equations  describing  this  ensem¬ 
ble.  The  results  of  the  analysis  are  lower  bounds  for 
the  free  distance  of  the  codes  and  upper  bounds  for 
the  maximum  likelihood  decoding  error  probability. 
For  the  case  where  the  size  of  the  scrambler  tends 
to  infinity  some  asymptotic  bounds  for  the  free  dis¬ 
tance  and  the  error  probability  are  derived.  Simula¬ 
tion  results  for  iterative  decoding  of  LDC  codes  are 
also  presented. 

Low-density  convolutional  (LDC)  codes  were  introduced  by 
Jimenez  and  Zigangirov  [1]  and  the  theory  of  these  codes  was 
further  developed  in  the  first  part  of  the  paper  [3] .  The  LDC 
codes  have  some  common  features  in  comparison  with  low- 
density  block  codes,  invented  by  Gallager  [2],  but  at  the  same 
time  there  are  differences,  which  arise  from  the  recurrent  na¬ 
ture  of  LDC  codes.  Particularly,  the  iterative  decoding  of 
LDC  codes  can  be  performed  by  a  pipeline  implementation. 

The  main  goal  of  this  paper  is  to  demonstrate  the  possi¬ 
bility  to  get  bounds  on  performances  of  LDC  codes,  similar 
to  bounds  for  conventional  convolutional  codes,  by  the  in¬ 
troduction  of  a  special  ensemble  of  Markov  scramblers  and 
application  of  Markov  chain  theory  (see  also  [4]). 

We  have  studied  two  classes  of  LDC  codes,  A  and  B.  In 
class  A,  a  rate  Rs  =  djc  convolutional  scrambler  is  followed 
by  a  rate  Rb  =  (d  —  1)  /d  degenerated  component  convolu¬ 
tional  encoder  of  memory  zero.  (It  calculates  one  parity-check 
symbol  to  d  —  1  input  symbols.)  The  resulting  LDC  code  is  a 
homogeneous  (d(l  —  R),d)- code. 


Fig.  1:  Lower  bounds  on  the  free  distance.  The  dashed  lines 
correspond  to  (from  bottom  to  top)  (2,4),  (2.5,5)  and  (3,6)  codes 
of  class  A.  The  solid  lines  correspond  to  class  B  codes  with 
component  code  memory  2,3,4  and  5. 


1This  work  was  supported  in  part  by  Swedish  Research  Council 
for  Engineering  Sciences  under  Grant  98-216. 


Fig.  2:  Burst  error  probabilities  for  (3,6)-codes.  The  solid  lines 
show  (from  top  to  bottom)  simulation  results  for 
ms  =  129,257,513, 1025,2049,4097.  The  size  of  the  scrambler  is 
M  =  2.5 (ms  —  1).  The  union  bound  (dashed-dotted)  and  the 
expurgated  bound  (dotted)  are  shown  for  ms  =  129.  The  vertical 
dashed  line  shows  the  cut-off  rate  limit. 

In  class  B,  a  rate  Rs  =  dfc  convolutional  scrambler  is  fol¬ 
lowed  by  a  rate  Rb  =  (d  —  c  +  b)  /d  component  convolutional 
encoder.  To  simplify  the  description  in  this  paper  we  consider 
only  rate  R  =  1/2  LDC  codes. 

In  Fig.  1  lower  bounds  on  the  free  distance  of  some  dif¬ 
ferent  codes  are  given  as  a  function  of  the  scrambler  size  M. 
It  is  worth  to  note  that  the  bound  grows  linearly  with  M  for 
the  LDC  (3,6)-codes  of  class  A  and  only  logarithmically  for 
the  other  considered  codes.  Upper  bounds  on  the  burst  er¬ 
ror  probability,  together  with  simulation  results  of  iterative 
decoding,  are  presented  in  Fig.  2.  For  the  asymptotic  case 
we  proved,  that  there  exists  an  LDC  (3,6)-code  (LDC  (2,4)- 
code,  respectively)  of  memory  size  M,  for  which  the  burst 
error  probability  decreases  at  least  as  0(l/M2)  ( 0(1/M ))  , 
M  -t  oo,  for  signal  to  noise  ratios  Eb/N0  >  2.63  dB  (3.58 
dB).  In  analogy  with  conventional  convolutional  codes  we  can 
call  the  limit  values  of  Eb/No  cut-off  rate  limit. 
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Abstract  —  We  show  that  low-density  parity-check 
codes  are  random-like,  and  we  comment  this  result. 

I.  Introduction 

Gallager’s  low-density  parity-check  (LDPC)  codes  [1]  at¬ 
tract  renewed  interest.  MacKay  has  recently  shown  that 
LDPC  codes  are  indeed  good  in  a  precise  meaning  [2], 

Why  does  the  low-density  of  some  parity-check  matrix  re¬ 
sult  in  a  good  code,  whereas  most  of  its  linearly  equivalent 
matrices  are  not  of  low  density?  We  show  that  such  codes 
are  actually  random-like  (RL)  i.e.,  their  weight  distribution 
resembles  that  obtained  in  the  average  by  random  coding  [3], 
an  intrinsic  property  of  the  code,  not  of  a  peculiar  matrix. 

II.  Density  in  a  Systematic-Form  Matrix 

Let  a  binary  matrix  M  have  n  m-bit  columns  of  the  same 
constant  weight  j  >  1,  but  otherwise  random  and  mutually 
independent.  Assuming  M  of  full  rank,  it  can  be  transformed 
into  an  equivalent  systematic  matrix  Msys  =  [ Q  Im]  (Im  is 
the  m-order  unity  matrix,  the  submatrix  Q  has  n-m  columns 
and  m  rows)  using  the  Gaussian  elimination  process. 

Let  /ft  denote  the  average  density  of  the  n  —  i  columns  not 
yet  reduced  to  a  single  1  after  the  t-th  step  of  the  elimina¬ 
tion  process.  At  its  last  step  (the  m-th),  these  columns  make 
up  submatrix  Q  so  its  density  is  pm.  Interpreting  p,  as  the 
probability  of  having  a  1  at  any  given  location,  we  obtain  the 
recursion  formula: 

Pi  —  pi— i  [1  —  l/m-|-(l  -f-  2/m)p,—i  —  (1) 

Assume  first  that  i  may  increase  indefinitely.  According  to 
(1),  the  asymptotic  value  p of  the  density  is  a  root  of  the 
polynomial  (1/2  —  p)(p  —  1  /m).  The  right  hand  side  of  (1)  is 
an  increasing  function  of  i  for  1/m  <  p;_!  <  1/2  so  p^  =  1/2 
provided  po  >  1/m,  but  the  increase  in  p,  is  limited  by  the 
maximum  number  of  steps  m.  For  m  approaching  infinity, 
Pm  thus  approaches  1/2.  For  a  finite  value  of  m,  pm  is  an 
increasing  function  of  po  =  j/m.  Even  for  the  lowest  possible 
value  j  =  2,  numerical  computation  shows  that. densities  close 
to  1/2  are  obtained  even  for  moderate  values  of  m  (e.g.,  for 
m  =  50  and  m  =  100,  the  computed  density  of  Q  with  j  =  2 
is  0.49715  and  0.49999966,  respectively).  If  p0  =  1/2,  then 
p,  =  1/2,  Vi.  Anyhow,  Q  is  random  insofar  as  M  itself  is  so. 

Since  the  proof  only  involves  average  densities,  it  applies 
as  well  to  non-constant  column  weight  matrices  provided  no 
column  weight  is  allowed  to  become  less  than  2. 

III.  Application  to  LDPC  and  Linear  RL  Codes 

Let  the  parity-check  matrix  of  an  ( n ,  k)  linear  code  be 
a  matrix  M  as  in  the  previous  section,  resulting  in  an  LDPC 
code.  The  systematic  matrix  equivalent  to  is: 

HSys  =  [P1  In-k],  (2) 

Retired 


where  the  superscript  t  denotes  transposition.  Then,  P  is 
random  with  density  close  to  1/2  if  n  —  k  is  large  enough,  as 
will  be  assumed  throughout.  The  corresponding  systematic 
generator  matrix  is  GSys  —  [Ik  P],  with  P  same  as  in  (2). 

The  actual  implementation  of  easily  decodable  LDPC  codes 
leads  to  additional  constraints  which  may  weaken  the  random¬ 
ness  of  P.  Similarly,  some  constraints  on  the  columns  of  Pl 
will  be  needed  for  obtaining  a  large  minimum  distance  (e.g., 
j  —  2  results  in  dm in  =  3,  so  larger  j  will  be  preferred). 

For  designing  an  ( n,k )  linear  binary  code  at  random ,  we 
may  choose  each  entry  of  its  generator  matrix  G  independently 
of  the  others  with  probability  1/2.  With  high  probability,  we 
thus  obtain  a  matrix  of  rank  k  and  effective  length  n  (i.e.,  no 
column  weight  is  0)  which  generates  a  code  with  a  distance 
distribution  close  to  that  obtained  in  the  average  by  random 
coding.  Assume  furthermore  that  we  demand  that  no  column 
weight  of  G  is  less  than  2.  Then,  applying  the  result  of  section 
II  to  G,  submatrix  P  is  random  with  density  1/2.  Instead  of 
the  full  matrix  G,  it  suffices  to  randomly  generate  the  nonunity 
submatrix  P  with  density  1/2.  But  section  II  shows  that  we 
may  as  well  restrict  ourselves  to  generate  at  random  a  low- 
density  parity-check  matrix  H ^  or  generator  Gy. 

For  designing  a  random-like  code,  we  have  to  replace  truly 
random  binary  variables  by  pseudo-random  ones.  They  have 
the  average  properties  of  random  variables,  but  are  generated 
by  deterministic  means  which  enable  to  fulfil  the  above  con¬ 
ditions.  We  may  still  generate  either  the  submatrix  P  with 
density  1/2,  or  one  of  the  low-density  matrices  /fy  and  Gy. 
Doing  so,  we  only  obtain  a  weight  distribution  close  to  that 
of  random  coding  i.e.,  a  weakly  RL  code  [3],  Additional  con¬ 
straints  may  be  needed  to  ensure  a  large  minimum  distance. 

IV.  Remark  on  the  Weight  Distribution 

The  normalized  weight  distribution  of  the  codewords  obtai¬ 
ned  by  drawing  G  at  random  is  Bernoulli  of  mean  n/2.  When 
P  in  the  matrix  GSys  is  drawn  at  random,  almost  the  same 
result  is  obtained.  The  unity  submatrix  and  P  both  con¬ 
tribute  Bernoulli  distributions,  of  means  k/ 2  and  ( n  —  k)/2 
respectively,  as  if  the  information  and  parity  vectors  were  in¬ 
dependent.  Of  course,  they  are  not,  but  their  relation  is  as 
complicated  as  to  make  them  behave  as  if  they  were  so.  Spe¬ 
cifying  a  submatrix  P  involves  k(n  —  k)  binary  choices,  so  the 
information  and  parity  vectors  are  associated  according  to  a 
specific  rule  among  as  many  as  possible  ones. 
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Abstract  —  We  show  that  for  the  case  of  a  binary 
symmetric  channel  and  Gallager’s  decoding  algorithm 
A  the  threshold  can,  in  many  cases,  be  determined 
analytically.  We  further  present  optimal  codes  for  a 
large  range  of  rates. 

I.  Introduction 

Let  xo  be  the  expected  number  of  initial  errors,  i.e.,  xo  equals 
the  cross-over  probability  of  the  binary  symmetric  channel.  It 
was  shown  by  Gallager  [1]  that  the  expected  number  of  errors 
(under  the  independence  assumption)  in  the  /-th  iteration  is 
given  by  the  recursion 

xi  =  Xo  —  x0p+(xi_l)  +  (1  -  3o)p~(zi-l),  (1) 


p+(s)  :=  A  (I 
p"(x)  :=  A 


+  p(l  -  lx)' 


-  P(  1  —  2x)> 
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0-6  Jr 

0.4  r(3,5)  ~  0.0611860546 
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d, 

dr 

Rate 

3 

6 

0.5 

4 

8 

0.5 

5 

10 

0.5 

3 

5 

0.4 

4 

6 

0.333 

3 

4 

0.25 

Table  1:  Thresholds  for  the  binary  symmetric  channel 
and  Gallager’s  decoding  algorithm  A  for  some  standard 
regular  codes. 


and  where  (A(x),p(x))  is  the  degree  sequence  pair. 

The  threshold  xj  is  the  supremum  of  till  xo  in  R+  such  that 
xt(xo)  as  defined  in  (1)  converges  to  zero  as  l  tends  to  infinity. 

II.  Exact  Thresholds 

Lemma  1  Let  r  denote  the  smallest  positive  real  root  of  the 
polynomial  p(x)  :=  xp+(x)  +  (x  —  l)p“(x)  and  assume  that 
A2p'(1)  <  1.  Then  x0  <  X0  min{  *i(i)p^i)— >  T)- 

We  note  that,  although  one  can  construct  counterexamples, 
for  most  codes  one  has  xj  =  Xo  •  Table  1  summarizes  thresh¬ 
olds  of  some  standard  regular  codes  for  all  of  which  one  has 

Xq  =  Xq  . 

III.  Optimal  Codes 

Given  the  ease  with  which  thresholds  can  be  determined,  one 
might  wonder  whether  optimal  codes  for  the  given  decoder 
can  be  found.  This  is  indeed  the  case.  In  a  nutshell,  one 
can  show  that  for  a  wide  range  of  rates  the  optimal  codes 
for  Gallager’s  decoding  algorithm  A  are  left  and  right  con¬ 
centrated,  i.e.,  these  codes  have  at  most  two  non-zero  (left 
or  right)  degrees  and  these  non-zero  degrees  are  consecutive. 
Fig.  1  shows  the  achievable  thresholds  as  a  function  of  the 
rate  for  the  optimal  concentrated  degree  sequences.  The  solid 
curve  corresponds  to  the  capacity  formula  r  =  1  —  h(xj).  The 
dashed  curve  corresponds  to  the  optimal  concentrated  degree 
sequence  pairs.  Note  that  over  the  whole  range  the  optimal 
concentrated  codes  can  achieve  roughly  half  of  capacity.  Our 

'This  work  was  performed  while  the  first  author  was  a  summer 
intern  at  Bell  Labs. 
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Figure  1:  The  solid  curve  corresponds  to  the  capacity  for¬ 
mula  r  =  1  — A(xq)  whereas  the  dashed  curve  corresponds 
to  the  optimal  concentrated  codes. 


main  result  now  states  that  above  a  rate  of  roughly  2/5  these 
optimal  concentrated  codes  are  optimal.  This  implies  that  for 
these  rates  optimal  codes  and  their  thresholds  can  be  found 
analytically. 
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Abstract  —  In  the  context  of  lattice  quantization  of  a 
generalized  gaussian  source,  with  idependantly  identically 
distributed  signal  values,  a  low  complexity  indexing 
algorithm,  based  on  a  geometrical  approach,  is  proposed. 


I.  Introduction 


Signal  vectors  are  distributed  according  to  a  probability 

f  _n_  A 

I P 


density  function  (pdf)  of  the  kind  a  exp 


.  As  a 


v  w  ; 

codebook,  we  take  the  intersection  of  surfaces  of  constant  pdf 
with  the  cubic  lattice. 

Recently,  Chen  and  al.  [Chen97]  proposed  algorithms  for 
quantizing  to  the  Zn  lattice  with  a  boundary  well  suited  to  this 
pdf.  Unfortunately,  when  p  is  different  from  1  or  2,  enumerating 
or  indexing  lattice  points  reveals  difficult  [Chen97], 

Our  main  contribution  in  this  work  is  to  propose  a  low 
complexity  enumeration  algorithm  based  on  a  geometrical 
interpretation  and  valid  for  values  of  p  in  the  range  0  <  p  <  2  . 
This  point  of  view  offers  various  advantages  and  particularly  it 
enables  one  to  reduce  the  algorithm  to  the  calculation  of  a  few 
convolutional  products. 


II.  Mathematical  Preliminaries 


The  Lp  -  norm  of  the  vector 


x  is  Lp{x)  = 


\  M  J 

the  subspace  of  the  k  first  coordinates,  we  define  the 
Lp  -  sphere  of  center  c  and  radius  R 


Sk(c,R)=\x/Lp(c,x)<  r]  Eq.  I 


and  similarly  the  surface  of  the  Lp  -  sphere  Sk(c,R) .  We  will 

use  as  well  the  word  sphere  for  both  Sk(c,R)  or  Sk{c,R ) .  The 
number  p  is  omitted  for  the  sake  of  simplicity  of  notations. 


The  (generalized)  theta-function  of  the  lattice  Z"  is  the 

generating  function  associated  to  the  series  \#  S„  (k )  : 

V  Jk=  O...00 

°°  _  j_ 

e„{z)  =  ^jtts"(k”)zk .  Eq.  2 

*= o 

We  have 

S„+\{z)  =  e„{z)9x{z)  .  Eq.  3 

This  recursive  formula  enables  one  to  derive  all 
theta-functions  from  9]  (z) .  As  a  corollary,  the  term  of  index  k 


of  6n+\  (z)  is  the  convolutional  product  of  the  terms  of 
■9|  (z)  and  9n  (z)  up  to  the  index  k. 

The  coefficient  of  the  generic  term  of  the  theta-function 

counts  the  number  of  points  with  an  energy  k  =  Rp .  It  is  easy  to 
derive  an  enumeration  algorithm  from  a  counting  algorithm. 
Assuming  that  points  are  ordered  some  way,  we  associate  as  a 
number  to  a  given  point  the  number  of  points  which  precced  it. 
Here,  we  order  points  from  inside  to  outside  of  spheres  and  from 
the  bottom  to  the  top  of  the  last  axis. 


III.  Principle  of  Coding 


We  will  focus  on  the  index  calculation  at  a  given  energy. 

Any  hyperplane  x„  =  etc,  \cte\  <  R ,  an  integer,  cuts  S„  (/f)  in 


a  n-l -dimensional  sphere  of  radius 


.  Thus,  a 


sphere  in  dimension  n  can  be  seen  as  a  stack  of  spheres  in 
dimension  n-l  along  the  last  coordinate  axis  (say).  This  can  be 
written  as 


w  , 

r  i - 

— V  T 

S„{R)=  II 

S„-]\  VRp-\x„\p  +x„e„ 

kl 

J  J 

Hence,  the  n-\ -dimensional  spheres  being  disjoined,  we  can 
deduce  a  recursive  formula 


Eq.  5 


IV.  Algorithm 

Given  a  point  to  be  numbered  M{x}  ,...,xn) .  Set 

O  I  \P 

M k(x\,...,xk)  and  R£  =  /  |x,-|  .  We  want  to  compute  N 

the  index  associated  to  M.  This  index  appears  to  be  the  sum  of 
the  number  of  points  below  M \  for  every  k.  Thus  we  define  a 
function  named  Number  which  counts  the  points  below  Mk  in 
the  space  of  the  k  first  coordinates.  It  proceeds  by  adding  up  the 
cardinality  of  the  layers  under  Mk  .  These  cardinalities  are 
computed  by  the  means  of  theta-functions. 
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Abstract  — -  An  generalized  notion  of  source  divis¬ 
ibility  or  in  other  words  successive  refinement  of  in¬ 
formation  with  additional  requirement  of  exponential 
decrease  of  error  probability  is  considered.  A  con¬ 
dition  necessary  and  sufficient  for  possibility  of  such 
successive  refinement  is  established. 


I.  Introduction 


The  idea  of  source  divisibility  or  successive  refinement  of 
information  developed  in  works  of  Koshelev  [l]-[3],  Equitz  and 
Cover  [4],  Rimoldi  [5]  and  other  authors.  We  generalize  the 
concept  adding  the  requirement  of  reliability. 

Let  the  probability  distribution  (PD)  of  messages  of  the  dis¬ 
crete  memoryless  source  {A}  is  P*  =  {P*(x),  x  G  X},  where  a 
finite  set  X  is  the  alphabet  of  the  source.  Reproduction  alpha¬ 
bets  of  receivers  are  X 1  and  X2  and  the  corresponding  single¬ 
letter  distortion  measures  are  dk  :  X  x  Xk  — +  [0;  cx,,  k  =  1,2. 
Distortions  dk(x,  xk)  (k  =  1,2)  between  source  /V-lengtb  mes¬ 
sage  vector  x  and  its  reproducted  versions  xk  are  considered 
as  averages  of  per-letter  distortions. 


Pi,  Ai 


E2 ,  A2 


Fig.  1.  Two-level  communication  system. 


A  code  (/,  F)  =  (/1,  f2,  F\,  F2)  for  the  system  consists  of 
encoders:  fk  :  XN  — ►  {1,  2, ...,  Lk{N)},  k  —  1,2,  and  de¬ 
coders:  Fi  :  {1,2, ...,  Li(N)}  —  (A1)*, 

F2  :  {1,2,  ...,L\(N)}  x  {1,2, ....  L2(N)}  ->  (A2)". 

The  probabilities  of  the  sets  of  source  vectors  x  which  are 
reconstructed  (using  a  code  (/,  F))  out  of  the  permissible  dis¬ 
tortion  levels  Ai  and  A  2  at  each  destination  are  denoted  by 
ek{f,F,Ak,N)  =  efc,  k  =  1,2. 

Let  E  =  (Ei,E2),  A  =  (Ai,  A2).  A  pair  of  rates  (Ri,R2) 
( Rk  >  0,  k  =  1,2)  is  called  (E,  A)-achievable  for  reliabilities 
Ek  >  0,  distortion  levels  A*  >  0,  k  =  1,2,  if  for  every  e  >  0 
and  sufficiently  large  N  there  exists  a  code  (/,  F),  such  that 
A-1  log Lk(iV)  <  R.k  +  e,  efc  <  exp {-NEk}  k  -  1,2. 


II.  Divisibility  of  source  with  reliability 

Let  P  be  a  PD  on  X,  Q  —  {Q(a;1,a:2|a:)}  be  a  conditional 
PD  on  X 1  x  X2  and  Q(xk\x)  be  the  corresponding  marginal 
PD.  Denote  by  D(P  ||  P*)  the  divergence  of  PD  P  and  P* 
and  a(Ek)  =  {P  :  D(P  ||  P*)  <  Ek},  k  =  1,2. 


1This  work  was  supported  by  INTAS  Grant  94-469. 


Consider  a  function  4>(P,  E,  A),  values  of  which  are 
such  conditional  PD  Q  corresponding  to  a  PD  P  that 
for  a  given  A  if  P  G  a(Pi)  then  E p,Qdk(X,Xk)  = 
Y,XiXk  P(x)Q(xk\x)d(x,xk)  <  A/t,  x  €  X,  xk  €  Xk,  k  =  1,2, 
and  if  P  G  a(E2)  —  a(E\)  then  the  last  inequality  holds  only 
for  k  =  2.  Let  A4(P,  E,  A)  is  the  collection  of  all  such  func¬ 
tions  $(P,  E,  A)  for  given  E,  A  and  P. 

The  rate-reliability-distortion  function  R(Ek,  At),  k  =  1, 2, 
which  is  the  minimal  achievable  rate  for  one  terminal  source 
code  ensuring  reconstruction  of  messages  with  requirement  of 
reliability  Ek  and  distortion  level  A*,  is  known  [6j: 


R(Ek,  Afc) 


max  min  IpolX  AXk). 

Pea(Ek)Q-.EP'Qdk(.X,Xk)<Ak 


Definition.  Successive  refinement  from  a  level  (Pi,  Ai)  to 
(E2,  A2),  for  Ai  >  A2  and  Ei  <  E2  is  the  (E,  A)-achievability 
of  the  pair  of  rates  (R(Ei,  Ai),  R{E2,  A2)  —  R(Ei,  Ai)). 


III.  Necessary  and  sufficient  condition 
Theorem.  For  the  considered  multilevel  system  (Fig.  1) 
the  successive  refinement  takes  place  iff  there  exist  pairs  of 
PD  Pi  e  a(Ei),Qi  €  M(Pi,E,A)  and  P2  G  a(Ei),Q2  € 
A4(P2,E,  A),  such  that 

R{Ei ,  A,)  =  IPl,Ql  (X  A  A1 ),  R(E2,  Aa)  =  Ip2,q2( A  A  A2), 

and  the  random  variables  (RV)  A,  A2,  A1  form  a  Markov 
chain  Xp2  — *  A2  — >  A1,  where  Xp2  is  the  RV  A  with  the 
distribution  P2. 

Corollary.  When  the  receivers  requirements  on  reliability 
are  absent,  i.  e.  Pi  =  E2  >  0,  the  result  of  Equitz  and  Cover 
from  [4]  follows. 
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I.  Introduction 

This  work  addresses  the  coding  of.finite  binary  sequences  with 
a  finite  amount  of  decoding  computational  resources.  In  this 
setting,  we  propose  a  general  coding  methodology  and  discuss 
its  convergence  properties. 


II.  Resource  Bounded  Complexity  Distortion 


Consider  Shannon’s  traditional  communication  system  where 
the  source  decoder  is  replaced  by  a  universal  Turing  ma¬ 
chine  \E.  also  denotes  a  recursive  function  from  P  — 
Bn  =  to  if  programs  are  also  represented 

by  binary  sequences.  By  (p),  p  (E  P,  we  denote  the 
execution  of  program  p  on  $  using  less  than  t  execution 
steps  and  less  than  s  memory  cells.  In  this  setting,  it  is 
natural  to  measure  the  performances  of  the  encoder  with 


the  Resource  Bounded  Complexity  Distortion  Function  [3] 


defined  by:  Cp (x")  = 


i* 


(*?)) 


€  Bn 


where  x" 

Kt,s(-)  is  the  Resource  Bounded  Kolmogorov  Complexity  [l] 


and  Q(Bs(xi)  =  arg min!,»ei3»:dn(3.»i!/T>)<D  D  be¬ 

ing  a  distortion  constraint  according  to  a  distortion  mea¬ 
sure  d„(-,  ■).  There  is  an  interesting  equivalence  between 
CpS(xi)  and  R(D),  the  Rate  Distortion  Function.  For  a  sta¬ 
tionary  ergodic  source  with  recursive  probability  measure  p, 
lim  t  ,3  — OO  hmn — ,oo  (xj )  —  A(L/),  almost  surely  [3j.  The 

two  limits  in  this  statement  show  that  this  equivalence  holds 
only  for  infinite  observations  and  that  Shannon’s  theory  does 
not  bound  the  computational  power  of  the  decoder.  As  a  con¬ 
sequence,  the  coding  of  finite  objects  with  decoding  computa¬ 
tional  bounds  fits  better  in  Kolmogorov’s  algorithmic  frame¬ 
work  and  it  becomes  a  recursive  search  for  short  descriptions. 


III.  Genetic  Algorithms 

We  focus  on  time  complexity  and  follow  [4],  to  transform  ev¬ 
ery  program  p  into  a  new  string  of  length  n  +  c  by  stuffing  a 
new  “no  operation”  symbol  nop  to  p.  c  is  a  constant  such  that 
Vx”  6  Bn,  A'f(x")  <  n  +  c,  where  Kt(x ")  =  lim3_oo  A"M(x"). 
Hence,  the  problem  of  encoding  x"  can  be  reduced  to  a  search 
problem  in  an  exponentially  large  search  space  excluding  the 
possibility  of  an  exhaustive  search.  Genetic  Programming 
(GP)  [4]  is  a  very  attractive  solution  to  this  but  to  the  best  of 
our  knowledge  its  convergence  properties  are  not  well  under¬ 
stood.  Instead,  we  propose  to  use  Genetic  Algorithm  (GA) 
search  techniques  to  identify  good  representations.  The  use 
of  the  n0p  instruction  allows  us  to  modify  the  GP  search  into 
a  well  understood  GA  search  where  all  programs  have  the 


same  length.  An  evaluation  metric,  f(p),  commonly  called 
the  fitness  measure,  is  used  to  assign  to  each  program  p  of 
the  search  space  a  score  associated  with  the  ability  of  p  to 
represent  x".  Let  /(■)  be  the  indicator  function.  f(p)  = 
HD(p)  >  D)^± f=fto  +  I(D(p)  <  D)(n  +  c  -  l(p)  +  1) 
where  l(p )  is  the  length  of  p  (before  stuffing  symbols  nop), 
D(p)  =  dn(xn,  \k‘(p))  the  distance  between  its  output  and  x”. 
Dmax  =  sup^  y„6B„  {d(x",  2/")},  and  /?  >  0.  This  fitness 
ranks  programs  based  on  distortion  only,  if  the  search  oper¬ 
ates  outside  BXn,  a  ball  of  radius  D  centered  at  xn.  When 
it  operates  inside  BXn,  it  ranks  programs  based  only  on  their 
length.  Clearly,  elements  inside  BXn  have  fitness  greater  than 
elements  outside  Bx„  ■  With  this  measure,  a  typical  GA  pro¬ 
cess  uses  three  genetic  search  operators  (crossover,  mutation 
and  reproduction)  to  evolve  generations  of  programs  [4]. 

IV.  Convergence  Properties 

The  GA  process  can  be  modeled  by  a  Markov  Chain  or  more 
generally  by  a  quadratic  dynamical  system.  The  probabil¬ 
ity  to  have  an  element  with  maximum  fitness  in  the  pop¬ 
ulation  converges  [2]  and  it  can  be  shown  that  this  proba¬ 
bility  converges  to  1  if  the  best  individual  in  each  genera¬ 
tion  is  always  reproduced  in  the  next.  In  general,  conver¬ 
gence  to  1  can  be  guaranteed  but  this  is  not  a  sufficient 
property.  Another  important  point  is  the  speed  of  conver¬ 
gence  to  justify  the  superiority  of  this  approach  to  the  ex¬ 
haustive  search.  It  can  be  argued  that  the  convergence  is 
fast.  To  see  this,  denote  by  pt  the  distribution  at  generation 
t.  Let  poo  be  the  stationary  distribution.  Define  the  mix¬ 
ing  time  [2]  as  x(e)  =  maxPo  min{<  :||  pti  —p x  ||<  e,V<'  >  t 
where  ||  •  ||  denotes  the  variation  distance  and  e  6  (0,  l]. 
It  can  be  shown  [2]  that  for  a  one-point  crossover  system, 
r(e)  <  (w  +  c)  ln(n  -f  c)  +  (n  -f-  c)  In  e-1. 
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Abstract  —  Karhunen— Loeve  transforms  (KLT’s) 
are  the  optimal  orthogonal  transforms  for  transform 
coding  of  Gaussian  sources.  This  well-known  fact  is 
usually  established  with  approximations  from  high- 
resolution  quantization  theory.  How  high  does  the 
rate  have  to  be  for  these  approximations  to  be  ac¬ 
curate?  The  minimum  rate  allocated  to  any  compo¬ 
nent  should  be  at  least  about  one  bit.  (The  aver¬ 
age  rate  per  component  may  be  much  higher.)  Does 
the  rate  actually  have  to  be  high  for  the  KLT  to  be 
optimal?  No,  the  KLT  is  optimal  more  generally. 
Two  new,  simple  proofs  of  this  fact  are  described. 
They  rely  on  a  scale  invariance  property,  but  not  on 
high-resolution  approximations  or  properties  of  opti¬ 
mal  fixed-rate  quantization. 

Let  {z(n)}„ez+  be  a  sequence  of  independent,  identically 
distributed  (i.i.d.),  zero-mean  Gaussian  random  vectors  of  di¬ 
mension  N  with  covariance  matrix  Rx  =  E\xxT\.  In  trans¬ 
form  coding,  an  orthogonal  linear  transform  T  is  applied  to 
each  source  vector  to  get  a  vector  of  transform  coefficients 
V  —  Tx.  The  transform  coefficients  undergo  fixed-  or  variable- 
rate  scalar  quantization,  yielding  y  =  Q(y)',  a  reproduction 
vector  is  obtained  by  inverting  the  transform:  x  =  T~1y.  The 
fidelity  of  reproduction  is  measured  by  the  mean-squared  error 
per  component  between  the  source  vector  and  the  reproduc¬ 
tion:  D  =  AT^x  -x\\2. 

A  transform  that  makes  the  transform  coefficients  uncorre¬ 
lated  is  called  a  Karhunen-Loeve  transform  (KLT).  The  opti¬ 
mality  of  the  KLT  was  first  shown  by  Huang  and  Schultheiss 
under  assumptions  of  optimal  fixed-rate  quantization  and  a 
mild,  commonsense  condition  on  the  bit  allocation.  (Earlier 
work  by  Kramer  and  Mathews  did  not  involve  quantization 
and  was  not  in  an  operational  rate-distortion  framework.) 
Optimality  of  the  KLT  can  also  easily  be  established  under 
the  assumption  that  each  component  quantizer  has  distortion- 
rate  performance  described  by 


Di  = 


(1) 


where  er,  is  the  variance  of  yi .  This  result  relies  on  optimal, 
arbitrary-real  bit  allocation,  which  is  unrealistic. 

At  high  rates,  (1)  is  a  good  approximation  of  the  perfor¬ 
mance  of  entropy-coded  uniform  quantization  (ECUQ).  The 
original  intention  of  this  work  was  to  determine  how  high  the 
rate  has  to  be  for  the  KLT  to  be  optimal  or  nearly  optimal 
when  using  ECUQ.  Actually,  there  is  no  limitation  on  the  rate 
for  the  KLT  to  be  optimal.  Also,  through  numerical  calcula¬ 
tions,  bit  allocations  based  on  (1)  are  close  to  optimal  when 
each  coefficient  has  a  rate  of  at  least  one  bit  per  sample. 

Limits  of  high-resolution  analysis  Lagrangian  bit  alloca¬ 
tion  using  (1)  is  easy  because  of  the  simple  form  of  dDi/dRi. 
Where  (1)  is  accurate,  the  optimal  allocation  of  bits  results 
in  equal  quantization  step  sizes  for  each  transform  coefficient. 


(a)  ..  (b) 


The  accuracy  of  the  derivative  of  (1)  is  assessed  in  Fig.  (a). 
With  af  =  1  and  o\  =  1/4,  optimal  bit  allocations  are  com¬ 
pared  to  those  obtained  with  equal  quantization  step  sizes  in 
Fig.  (b). 

Optimality  of  the  KLT  The  optimality  of  the  KLT  holds 
much  more  generally  than  previously  published  results  indi¬ 
cated.  The  new  result  below  does  not  rely  on  optimal  fixed- 
rate  quantization  or  high-resolution  quantization  theory. 
Theorem  [1]  Assume  that  the  distortion-rate  performance 
of  a  scalar  quantizer  applied  to  a  component  with  variance  o1 
is  D  =  a 2f(R).  Then  a  KLT  is  an  optimal  transform,  i.e., 
for  any  given  maximum  rate,  it  minimizes  the  distortion. 

We  may  assume  that  /( • )  is  nonincreasing;  if  R\  >  R2  but 
f(Ri)  >  /(H2),  rate  R\  can  be  replaced  in  any  purportedly 
optimal  solution  by  rate  R2.  /( • )  need  not  be  convex. 

Proof  1:  Let  T  be  any  orthogonad  transform.  Suppose  that 
Ri  bits  are  allocated  to  transform  coefficient  y,.  Assume  of  > 
erf  implies  Ri  >  Rj;  otherwise,  the  distortion  can  be  reduced 
by  the  permutation  of  T  that  swaps  yi  and  yj. 

If  the  (i,j)  component  of  Ry  =  TRxTt  is  nonzero  for  some 
i  #  j,  the  Jacobi  rotation  that  zeroes  this  value  does  not  in¬ 
crease  the  distortion.  Repeating  the  process  until  convergence 
(the  classical  Jacobi  algorithm  for  computing  eigendecompo- 
sitions)  yields  a  KLT  at  least  as  good  as  T.  ■ 

Proof  2  (Telatar):  This  proof  is  based  on  elementary  prop* 
erties  of  majorization  [2].  The  problem  is  to  minimize  the 
function  D  =  N~1  JT=1  &if(Pt)  by  manipulating  the  af’s 
through  the  choice  of  T.  Let  cr  =  (of,  o\,  . . . ,  a%)  = 
diag (TRxTt).  For  a  Hermetian  matrix,  the  diagonal  elements 
are  majorized  by  the  eigenvalues,  so  a  is  majorized  by  a  vec¬ 
tor  A  of  eigenvalues  of  Rx.  Now  the  majorization  of  o  by  A 
is  equivalent  to  o  being  in  the  convex  hull  of  the  AT!  permu¬ 
tations  of  A.  Thus,  we  are  left  with  minimizing  D  over  the 
convex  polytope  defined  by  the  permutations  of  A.  In  mini¬ 
mizing  a  linear  function  over  a  convex  polytope,  the  optimum 
is  always  attained  at  a  corner  point.  This  establishes  that  the 
optimal  transform  is  a  KLT.  ■ 
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Abstract  —  We  derive  the  rate-distortion  region  for 
the  two-channel  multiple  description  problem  on  sta¬ 
tionary  discrete  ergodic  and  nonergodic  sources  with 
alphabets  admitting  an  ergodic  decomposition.  The 
results  do  not  provide  a  single-letter  representation 
for  the  rate-distortion  region  on  i.i.d.  sources. 

I.  Introduction 

In  multiple  description  (MD)  source  coding  with  two  channels, 
a  source  is  described  at  two  different  rates,  and  each  descrip¬ 
tion  is  sent  over  a  separate  channel  to  the  receiver.  Each 
channel  has  some  probability  of  breaking  down,  in  which  case 
all  of  the  data  sent  on  that  channel  is  lost.  If  only  channel  i 
is  working,  the  receiver  makes  reproduction  Xi  with  average 
distortion  Di  using  the  rate-fZ*  description  sent  on  channel  t. 
When  both  channels  work,  the  receiver  makes  reproduction 
X\2  with  average  distortion  Du  using  the  description  pro¬ 
vided  by  combining  the  information  on  both  channels  with 
an  additional  rate-fZis  description.  The  descriptions  of  both 
Xi  and  X-i  are  available  when  decoding  -Xi2;  the  additional 
rate  iZi2  spent  on  X12  can  be  treated  as  refinement  and  split 
arbitrarily  between  the  two  channels. 

Other  authors  have  found  upper  and  lower  bounds  on  the 
achievable  rate  distortion  region  (the  set  of  achievable  vectors 
(R,D)  =  (j?i,fZ2,JZi2,Di,D2,D12))  for  the  two- channel  mul¬ 
tiple  description  of  an  i.i.d.  source.  The  bounds  do  not  match 
for  all  sources.  We  present  a  new  achievability  theorem  and 
matching  converse  giving  the  achievable  rate-distortion  region 
for  both  ergodic  and  nonergodic  sources.  On  i.i.d.  sources,  our 
converse  is  similar  to  that  of  [1],  and  our  achievability  result 
uses  an  existing  achievability  result  from  [2].  We  use  a  La- 
grangian  approach  and  closely  parallel  [3]. 

Both  the  converse  and  achievability  proofs  make  use  of  dis¬ 
tributions  with  a  property  that  we  call  n-block  conditional 
independence,  defined  in  the  following  section.  The  use  of 
distributions  with  this  property  arises  from  the  observation 
that  the  bounds  of  [1]  and  [2]  match  when  Xi  and  X2  are 
conditionally  independent  given  X.  While  such  conditional 
independence  is  not  observed  on  a  symbol-by-symbol  basis,  it 
arises  naturally  when  using  n-dimensional  codes  since  X™ ,X£ 
and  Xi2  are  all  uniquely  determined  by  Xn. 

II.  N-Block  Conditional  Independence 

Let  the  elements  of  a  one-sided  infinite  sequence  Y  be 

denoted  Yi ,  Y2 , -  We  divide  these  elements  into  n- 

blocks  as  Y(t)  =  Y(i”1)n+1.  We  say  that  distribu¬ 

tion  q(ii2,x1,x2|x)  has  n-block  conditional  independence 
if  q(xi2,x1,x2|x)  =  YlZ=i9c(^{k),i.i(k),i.2(k)\x{k)),  and 
g"(£" ,  £5 |*n)  =  q"  (^"l®")*?"  (*? |*n).  Define  T(n)  to  be  the 
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set  of  all  distributions  q(x12,xi,x2|x)  with  n-block  condi¬ 
tional  independence  for  a  particular  n,  and  let  r  =  U“,r(»). 

III.  Results 

Let  A  be  a  discrete  source  alphabet,  and  define  A°°  to  be 
the  set  of  one-sided  sequences  from  A.  Let  p  be  a  stationary 
source  with  alphabet  A  00  and  margined  pn  on  A".  Let  A  be 
a  discrete  reproduction  alphabet,  and  let  p  :  A  x  A  -f-  [0,  oo) 
be  a  nonnegative  distortion  measure.  We  assume  that  there 
exists  a  reference  letter  y*  €  A  such  that  Epp(x,y)  =  d*  <  oo. 
Define  p{xn,yn)  =  £"=1  p(*i,Vi)- 

We  denote  an  MD  quantizer  of  blocklength  n  by  Qn  = 
(<??,<??,  Q?j).  For  each  S,  Qs  maps  A"  onto  some  finite 
or  countable  set  of  codewords  C<|  from  A".  We  assume  that 
the  description  of  Jt"2  made  by  Q"2  is  a  refinement  of  the 
descriptions  X "  and  X?  made  by  Q"  and  Q2  respectively, 
since  these  individual  descriptions  are  available  to  the  decoder 
when  decoding  X"2.  The  codeword  descriptions  are  assumed 
to  be  uniquely  decodable. 

The  set  7Z(p)  of  asymptotically  achievable  rate-distortion 
vectors  (R,  D)  is,  by  a  timesharing  argument,  a  convex  set, 
and  can  be  entirely  characterized  by  its  support  functional 
j(a,0,  p)  =  inf(r,  d)€tt(p)  Zse^(Qsds  +/3srs),  where  M  = 

{{!}>  (2}i  {12}}- 

The  weighted  rate-distortion  function  is  defined  as 
p)  =  inf  Jn(a,0,p), 

n 

where 

JnW, p)=  inf  ±[V  asiWP(*n.*S) 

qer(n)  n  ■f—f 
S€M 

2 

+  Y,  a  W"(*n;  *")  +  /w«?(*n;  x?,\x?,x?)}. 

»=i 

Stationary  Ergodic  Sources:  When  p  is  stationary  and 
ergodic,  the  following  result  holds. 

Theorem  1:  p)  =  J(a,0,p). 

Stationary  Nonergodic  Sources:  When  p  is  stationary 
and  nonergodic,  but  an  ergodic  decomposition  {px  :  x  G  A00} 
of  p  exists,  then  the  following  results  hold. 

Theorem  2:  J(a,0,  p)  =  f  J(a,0,px)dp(x). 

Theorem  3:  j(a,(3,  p)  =  J(a,0,p). 
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On  the  Rate-Distortion  Region  for  Multiple  Descriptions  * 
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The  problem  of  source  coding  with  multiple  descrip¬ 
tions  (henceforth  the  multiple  descriptions  problem)  was 
first  posed  by  Gersho,  Witsenhausen,  Wolf,  Wyner,  Ziv 
and  Ozarow  at  the  1979  IEEE  Information  Theory  Work¬ 
shop.  Since  then,  this  problem  has  been  extensively  stud¬ 
ied.  El  Gamal  and  Cover  [1]  obtained  an  inner  bound  on 
the  rate-distortion  region  for  multiple  descriptions,  and 
showed  that  it  is  tight  for  the  case  of  deterministic  dis¬ 
tortion  measures  (see  [1],  Theorem  2).  Ozarow  [2]  showed 
that  this  inner  bound  is  also  tight  for  the  Gaussian  source 
with  the  square  error  distortion.  Furthermore,  Ahlswede 

[5],  Zhang  and  Berger  [4]  showed  that  the  El  Gamal-Cover 
region  is  tight  for  the  case  of  no  excess  rate  for  the  joint 
description.  In  the  excess  rate  case,  Zhang  and  Berger 
[4]  showed  by  a  counterexample  that  the  El  Gamal-Cover 
region  is  not  tight  in  general.  How  to  establish  the  rate- 
distortion  region  for  multiple  descriptions  is  still  open. 
It  is  one  of  the  well  known  hard  problems  in  multiuser 
information  theory. 

In  this  paper,  we  study  the  problem  of  source  coding 
with  multiple  descriptions,  which  is  described  as  follows. 
For  a  discrete  memoryless  source  X,  there  are  two  en¬ 
coders  Ei  and  E2,  and  three  decoders  Di,  D2  and  Do- 
The  two  encoders  Ei  and  E2  describe  the  source  X  at  re¬ 
spective  rates  Ri  and  R2.  Decoder  Di  receives  the  output 
of  encoder  Ei  only,  and  it  can  recover  X  with  distortion 
D\.  Decoder  D2  receives  the  output  of  encoder  E2  only, 
and  it  can  recover  X  with  distortion  T>2.  Decoder  Do  re¬ 
ceives  the  outputs  of  both  encoders  Ei  and  E2,  and  it  can 
recover  X  with  distortion  Do-  We  show  that  if  decoder 
D2  (or  D  i)  is  required  to  recover  a  function  of  the  source 
X  perfectly  in  the  usual  Shannon  sense,  the  El  Gamal- 
Cover  inner  bound  on  the  rate  distortion  region  is  tight. 
As  a  corollary,  the  Rimoldi  [7]  rate-distortion  region  for 
successive  refinement  of  information,  the  Kaspi  [8]  rate- 
distortion  function  when  side-information  may  be  present 
at  the  decoder,  and  the  El  Gamal-Cover  [1]  achievable 
rate  region  for  multiple  descriptions  with  deterministic 
distortion  measures  can  all  be  obtained.  We  have  also 
obtained  a  new  outer  bound  on  the  rate-distortion  region 
which  enhances  the  outer  bound  due  to  Witsenhausen 
and  Wyner  [3].  This  new  outer  bound  implies  some  inter¬ 
esting  facts  regarding  the  achievable  rate-distortion  vec- 
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tors.  Finally,  inspired  by  the  problem  of  multiple  de¬ 
scriptions  with  deterministic  distortion  measures  studied 
by  El  Gamal  and  Cover  [lj,  and  the  problem  of  symmet¬ 
rical  multilevel  diversity  source  coding  studied  by  Roche, 
Yeung,  Hau  and  Zhang  (see  [9]  and  [10]),  we  pose  a  multi¬ 
level  diversity  source  coding  problem  for  further  studying. 
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Abstract  —  We  show  the  asymptotic  attainability  of 
R(D)  without  assuming  reference  letters. 

I.  Introduction 

To  show  the  attainability  of  R(D)  for  a  source  X  with  a  single 

letter  fidelity  criterion  d(aN ,bN)  =  ^(°«>  Mi  we  usually 

assume  a  reference  letter  y *  such  that 

E{d(Xt,y’)]<  oo.  (1) 

Its  importance  is  readily  understood  by  the  fact  that  most  of 
known  source  coding  theorems,  except1  possibly  for  [3]  and  [4], 
rely  on  it  or  on  a  stronger  assumption  of  bounded  distortion. 

We  modify  a  scheme  in  [2]  and  prove  a  coding  theorem 
for  a  stationary  abstract-alphabet  source  only  assuming  an 
auxiliary  source  WN  such  that  E[d(XN ,  Wv)]  <  oo. 

II.  Rate-distortion  function 

The  iVth-order  rate-distortion  function  is  the  infimum 

RN(D)  =  inf±I(XN-,WN)  (2) 

over  all  WN  such  that  (1  /N)E[d(XN ,  WN)}  <  D.  It  converges 
to  R(D)  as  N  -4  oo  whenever  X  is  stationary. 

III.  Variable-rate  variable-distortion  coding 

We  consider  a  reproduction  code  C?  consisting  of  infinitely 
many  reproduction  codewords  y^  €  BN ,  m  =  1,  2,  •  •  •  ,  and 
an  addressing  code  Cf  consisting  of  infinitely  many  binary 
bm,  m  =  1,  2,  •  •  •  .  For  each  m,  we  let  fm  =  |6m|  so  that 

^l+^logm+L+l0gAr  <  £m  <  (l  +  log  m+jj  +log  N+l. 

(3) 

Then,  lm  satisfy  Kraft’s  inequality  and  hence  we  can  assume 
that  Ca  is  uniquely  decodable. 

Our  encoding  scheme  is  as  follows.  For  given  XN  —  xN , 
we  first  observe  the  outcome  WN  =  wN  and  then  search  for 
the  smallest  rh  satisfying  d(xN ,  y(/)  <  d(xN ,wN).  The  trans¬ 
mitted  codeword  is  then  6^  G  C„v . 

Let  D(xN,wN,CN)*d(xN,y%)/N  and  R(xN,wN,CN)  =£A/N 
respectively. 

IV.  Coding  theorem 

For  an  auxiliary  source  WN,  let  YN  be  an  independent  replica 
of  WN  and  construct  a  random  ensemble  of  codes,  C  N ,  by  se¬ 
lecting  reproduction  codewords  randomly  and  independently 
of  each  other.  Let  £  be  the  expectation  with  respect  to  CN . 

Given  (. XN,WN)  =  {xN,wN ),  let  R(xN,wN)  =  £ [R(xN,wN,CN)]. 
Then,  from  (3),  we  have 

R(xN,wN)  <  kN  logf  [m]  +  gN  +  kN,  (4) 

1  [3]  is  for  fixed-distortion  coding  and  [4]  does  not  consider  R(D). 


where  we  let  Hn  =  (N+l)/N  and  gN  =  (log  N)/N. 

Let 

F(p,8\aN)  ±  Pr  ^iXW(aN,WN)  <p 

and  jjd(aN,WN)  <  8  |  |  .  (5) 

Then,  for  p  —  ixw(xN ,  wn)/N  +  A,  the  right-hand  side  of  (4) 
is  bounded  by 

<  kN  -(Np  +  NA  +  1)  +  gN  +  kNlog—~ - ~  (6) 

F(p  +  A,  <5|a:N) 

and  we  have 

E  WN)  ]  <  kN  [7(Vn;W^)  +  NA  +  1]  +  9n 

+k"E  [  J'°s  ■ (7> 

The  last  term  is  a  two-dimensional  Lebesgue-Stieltjes  integral 
in  (p,  8)  £  (—00,  00)  x  [0, 00)  and,  using  a  certain  upper  bound 
on  it,  we  have 

Theorem  1  For  N  >  3,  there  exists  CN  such  that 

E[D{XN  ,WN  ,CN)}  <  D  and  (8) 

E[R(Xn,Wn,Cn)]  <  RN(D)  +  ~ 

,  3  +  2  log  e  +  ZRn  (D)  t  13  +  logJV  /nN 
+  N  +  N2  ■ 

V.  A  REMARK  ON  TIME-CONTINUOUS  SOURCES 
Berger  discussed  the  extension  of  coding  theorems  to  time- 
continuous  sources  in  [1],  He  argued  therein  that  we  must 
extend  our  mathematical  tools,  which  have  been  proved 
to  be  useful  for  time-discrete  sources,  to  time-continuous 
sources.  Up  to  now,  however,  there  seems  to  be  little 
progress  in  the  attempt  to  extend  those  mathematical  tools, 
such  as  AEP  (asymptotic  equi-partition)  theorems,  to  time- 
continuous  sources.  Our  coding  scheme  proposed  in  this  paper 
can  be  extended  to  time-continuous  sources  since  no  facts  in 
ergodic  theory  is  used. 
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Abstract  —  A  communication  system  is  considered, 
where  messages  of  K  correlated  sources  Xi ,  •  •  • ,  Xk  are 
encoded  by  a  common  encoder  and  two  secondary  en¬ 
coders.  At  each  receiver  it  is  demanded:  (i)  to  re¬ 
cover  the  messages  of  a  part  of  the  sources  within 
given  distortion  levels,  (ii)  to  keep  secret  the  outputs 
of  another  part  of  the  sources  for  receivers  connected 
to  the  secondary  encoders;  and  (iii)  to  disregard  the 
information  of  the  rest  of  the  sources.  It  is  required 
that  for  a  given  reliability  E  >  0  at  all  receivers  the 
error  probabilities  of  the  blocklength  N  code  do  not 
exceed  2~NE.  Inner  and  outer  bounds  on  the  region 
of  achievable  rates  are  established,  depending  on  the 
reliability  E  and  permissible  distortion  and  secrecy 
levels. 


I.  Introduction 


We  study  a  problem  of  common  encoding  of  K  correlated 
sources  for  transmission  to  three  destinations  with  respect 
to  fidelity,  security,  and  reliability  criteria  for  the  one-stage 
branching  communication  system  shown  in  the  figure.  The 
problem  is  a  generalization  of  the  one  studied  by  Yamamoto 
[11- 


Let  Xn ,  re  =  1,JV  be  a  sequence  of  N  discrete,  indepen¬ 
dent,  identically  distributed  random  vectors  with  K  compo¬ 
nents,  which  represent  messages  of  the  fc-th  source  at  the  n- 
th  moment,  k  =  1,  K,  n  =  1,  N,  with  values  in  the  finite 
set  Xk,  k  =  1  ,K,  respectively.  Let  X\  x  ...  x  Xk  —  X, 
(Xi)N  x  ...  x  (Xk)N  =  (X)N .  For  each  receiver  m  =  0, 1, 2  the 
set  of  indexes  of  sources  {1, ...,  K}  is  divided  into  three  groups: 
{1, ...,  K}  =  GT  U  U  £s\  Ql  =  0-  We  denote  by  small  let¬ 
ters  the  corresponding  values  of  random  vectors  and  random 
variables,  such  that  (xi,n,  ...,XK,n)  —  xn,  (xfc,i>  ...,£fc,7v)  =  xk, 
k  =  1  ,K,  (xi , ...,  x/c)  =  x.  Let  X™n  be  the  reconstruc¬ 
tion  of  the  n~th  message  of  the  fc-th  source  at  the  m-th  re¬ 
ceiver,  with  values  in  a  finite  set  A™,  respectively,  n  —  1,  N , 
k  G  Q™,  m  =  0,1,2,  XJ"  X  •••  X  X%  =  Xm.  For  mes¬ 
sages  received  at  the  outputs  we  use  analogous  notations,  such 
as  (x™n, . . .  ,Xj£]tl)  =  i™,  (x™x, . . .  ,x™N)  =  x™ ,  k  =  MG 
(x™, . . .  ,x”)  —  xm,  77i  —  0, 1,2.  The  common  probability  dis¬ 
tribution  of  the  vector  of  messages  of  K  sources  is  denoted  by 
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P*  =  { P*(x),x  €  X}.  Let  d^:XkX  X T  -» [0,co),  k  =  1,  K, 
m  =  0, 1, 2  be  the  corresponding  distortion  measures.  Distor¬ 
tion  for  JV-vectors  is  defined  by  averaging.  A  code  is  a  family 
of  six  mappings:  (i)  three  encoding  functions  fo  :  (X)N  — * 
{1, •  •  • , Mo(N)},  /j  :  {1, - ■  • , Mo(AT)}  -4  {l,---,Mi(A)}, 
h  '■  {1,  •  •  • , Mo(N)}  — >  {1,  •  •  • ,  M2{N)},  and  (ii)  three  decod¬ 
ing  functions  Fm  :  {1,  •  •  • ,  Mm(N)}  — >  (Am)N  ,  m  =  0, 1, 2. 
Let  A)  =  {x  :  Fo(/o(x))  =  x°,  d£(x*,x2)  <  A*,  k  €  Q°}, 

Am  =  {x  :  Fm(/m(/o(x)))  =x”\  d^(xfc,x^)  <  AJ*,  k  e  QT, 

d^(xfc,x^)  >  A™ ,  k  6  g?},  m  =  1,2. 

Security  evaluation  by  distortion  measures  was  first  consid¬ 
ered  by  Yamamoto  in  [2]  and  later  in  [3]. 

Error  probabilities  of  the  code  ( f,F )  are:  em  = 
1  —  P*N(Am),  rn  =  0,1,2.  For  brevity  we  denote 

(A7V--,An  =  Am,  (A0,  A1,  A2)  =  A.  A  triplet  of  non¬ 
negative  numbers  (Ro,  R\,R2)  is  called  (E,  A)-achievable  for 
E  >  0,  A)T  >  0,  k  =  1,  K,  m  =  0, 1, 2,  if  for  any  e  >  0  and  N 
sufficiently  large  there  exists  a  code  (/,  F)  such  that 

A-1  log Mm{N)  <  Rm  +e,  em  <  exp (-NE),  m  =  0, 1,2. 

The  inner  and  the  outer  bounds  for  the  rates-reliability- 
distortions-partial  secrecy  region  R{E,  A)  are  constructed. 
When  E  — *  0,  we  obtain  the  inner  and  outer  bounds  for  the 
corresponding  rates-distortions- partial  secrecy  region  72.(A). 

In  a  special  case  we  arrive  at  the  results  of  Yamamoto  [1] 
for  a  bidirectional  branching  communication  system,  but  our 
inner  bound  is  larger.  The  results  are  consistent  with  the 
corresponding  results  from  [2],  [4],  [5]. 

Remark:  Some  cases  of  coincidence  of  the  inner  and  the 
outer  bounds  are  pointed  out. 
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Abstract  —  This  paper  presents  a  theoretical  anal¬ 
ysis  of  a  recent  algorithm  for  fast  correlation  attacks, 
based  on  the  use  of  convolutional  codes  [1]. 

I.  Introduction 

Consider  a  binary  synchronous  stream  cipher  where  a  correla¬ 
tion  has  been  identified  between  the  keystream  sequence  and 
the  output  from  one  of  the  LFSRs.  Then  a  correlation  attack 
can  be  applied  [2,  3]. 

Let  the  LFSR  have  length  l  and  let  the  set  of  possible  LFSR 
sequences  be  denoted  by  £.  Clearly,  |£|  =  2!  and  for  a  fixed 
length  N  the  truncated  sequences  from  £  is  also  a  linear  [A',  Z] 
block  code,  referred  to  as  C.  Furthermore,  the  keystream  se¬ 
quence  z  =  Zi,  22, . . . ,  Zjv  is  regarded  as  the  received  channel 
output  and  the  LFSR  sequence  u  =  m,  «2,  •  •  • ,  un  is  regarded 
as  a  codeword  from  C.  Due  to  the  correlation  between  u; 
and  Zi,  we  can  describe  each  z,  as  the  output  of  the  binary 
symmetric  channel,  BSC,  when  Ui  was  transmitted.  The  cor¬ 
relation  probability  1  —  p,  defined  by  1  —  p  =  P(ui  =  Zi),  gives 
p  as  the  crossover  probability  (error  probability)  in  the  BSC. 

The  algorithm  proposed  in  [1]  transforms  a  part  of  the  code 
C  stemming  from  the  LFSR  sequences  into  a  convolutional 
code.  The  encoder  of  this  convolutional  code  is  created  by 
finding  suitable  parity  check  equations  from  C.  Here  we  can 
only  give  a  brief  sketch  of  the  methods  to  create  this  convolu¬ 
tional  code,  for  a  complete  description  see  [1]. 

Let  us  start  with  the  linear  code  C  stemming  from  the  LFSR 
sequences.  There  is  a  corresponding  l  x  N  systematic  gener¬ 
ator  matrix  Glfsr  =  (hZ).  Let  gi  be  the  ith  column  of 
Glfsr ■  Clearly,  u,  =  uog,,  where  u0  is  the  initial  state  of 
the  LFSR.  Fix  the  memory  of  the  convolutional  to  B.  We  are 
now  interested  in  finding  parity  check  equations  that  involve 
a  current  symbol  un,  an  arbitrary  linear  combination  of  the 
B  previous  symbols  un-\,  ■ . . ,  u„_b,  together  with  at  most  t 
other  symbols.  Clearly,  t  should  be  rather  small.  Parity  check 
equations  for  ub+i  with  weight  t  outside  the  first  B  +  1  po¬ 
sitions  can  then  be  found  by  finding  linear  combinations  of  t 
columns  of  G  such  that 

B 

Mi,  +  .  .  .  +  Uit  =  U0 (gi,  +  •  •  •  +  git)  =  ^cJui-(B-j)  +  Ui- 

i- 1 

To  recover  the  initial  state  of  the  LFSR  it  is  enough  to 
decode  l  consecutive  information  bits  correctly.  However,  in 
our  application  there  is  neither  a  starting  state  nor  an  ending 
state.  To  deal  with  this  problem  we  decode  over  J  symbols 
where  J'wl-f  10J3.  Optimal  decoding  (ML  decoding)  of  con¬ 
volutional  codes  uses  the  Viterbi  algorithm  to  decode.  This 
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estimate  from  the  Viterbi  algorithm  is  then  used  to  provide 
the  corresponding  estimate  of  the  initial  state  of  the  LFSR. 

II.  Theoretical  analysis 

The  principle  of  the  analysis  is  the  following.  We  start  by 
calculating  the  average  number  of  parity  check  equations  that 
we  find  by  the  proposed  algorithm,  which  gives  us  the  rate  of 
the  convolutional  code.  Let  E[m]  be  the  expected  number  of 
parity  check  equations.  Then  it  can  be  shown  that 

E[m ]  = 

In  our  case,  we  consider  an  “embedded”  convolutional 
code.  Then  the  received  symbol  r£*^  corresponding  to  code¬ 
word  symbol  Vn1  is  given  as  the  sum  of  t  keystream  symbols, 
ri' 1  —  Zju  + . . .  +  Zjti .  If  P(zi  =  Ui)  =  1/2  -  J  it  can  be  shown 
that  P(r„  ^  =  vi'^)  =  1/2  —  e,  where 

e  =  2t~18t. 


Given  the  number  of  equations  and  the  error  probability  of 
the  BSC  we  can  use  results  from  convolutional  coding  to  get 
a  bound  on  the  burst  error  probability  of  the  convolutional 
code.  This  burst  error  probability  determines  the  probability 
that  the  proposed  attack  fails.  Finally,  we  fix  the  rate  to  be 
R  =  Ro,  where  Ro  is  the  computational  cutoff  rate.  Based 
on  these  assumptions,  Theorem  1  gives  the  required  initial 
correlation  for  given  length  N,  LFSR  length  Z,  and  algorithm 
parameters  B  and  t. 


Theorem  1  With  probability  1  —  pe,pe  <  1,  the  proposed  at¬ 
tack  succeeds  if 


8 


41n  2  •  2l~B  \  ^ 

ev)  ) 


where  the  correlation  probability  is  P(zi  =  u,)  —  1/2  +  8,  and 
pe  <  J2~B . 


The  result  of  Theorem  1  agrees  well  with  the  simulation 
results  presented  in  [1]. 
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Abstract  —  We  show  that  fast  correlation  attacks 
based  on  Gallager  decoding  algorithm  with  parity- 
check  equations  of  weight  4  or  5  usually  provide  better 
performance  than  all  previously  known  attacks. 

I.  Introduction 

In  a  binary  additive  stream  cipher  the  ciphertext  is  ob¬ 
tained  by  adding  bitwise  the  plaintext  to  a  pseudo-random 
sequence  s.  This  running-key  is  produced  by  a  pseudo-random 
generator  whose  initialization  is  the  secret  key  of  the  cipher. 
A  classical  method  for  generating  a  running-key  is  to  combine 
n  LFSRs  by  a  Boolean  function  /.  Correlation  attacks  intro¬ 
duced  by  Siegenthaler  [5]  exploit  the  existence  of  a  correlation 
between  the  running-key  and  the  output  of  one  constituent 
LFSR  for  recovering  the  initialization  of  each  LFSR  sepa¬ 
rately.  When  the  combining  function  is  t-th  order  correlation- 
immune,  this  attack  should  examine  (t  4- 1)  LFSRs  together. 

Proposition  1  Let  t  denote  the  maximal  correlation-immu¬ 
nity  order  of  the  combining  function  f .  Then  there  exists  a 
subset  of  t- 1-1  variables,  {x,1, . . .  and  a  Boolean  func¬ 

tion  g  with  t  +  1  variables  such  that  pg  —  Pr[f  ^  g]  <  1/2. 
Moreover,  the  lowest  possible  value  of  pg  is  achieved  by  the 
affine  function  g  —  £V=1  xij  +  £  where  e  €  {0, 1}. 

Let  <x  denote  the  sequence  produced  by  these  (t  + 1)  LFSRs 
combined  by  the  affine  function  g.  The  running- key  sequence  s 
can  then  be  seen  as  the  result  of  the  transmission  of  a  through 
the  binary  symmetric  channel  with  error  probability  pg.  The 
sequence  a  corresponds  to  the  output  of  a  unique  LFSR  of 
length  L  whose  feedback  polynomial  P  is  derived  from  the 
feedback  polynomials  of  the  constituent  LFSRs.  Any  subse¬ 
quence  of  length  N  of  a  is  then  a  codeword  of  an  [ N ,  L]-linear 
code  C.  The  attack  aims  at  recovering  L  consecutive  bits  of  a 
from  the  knowledge  of  N  bits  of  s.  This  can  be  done  by  decod¬ 
ing  ($n)n<N  relatively  to  C.  Meier  and  Staffelbach  attack  uses 
the  iterative  decoding  process  due  to  Gallager  [1]  with  parity- 
check  equations  of  weight  3.  Johansson  and  Jonsson  recently 
proposed  two  new  techniques  for  fast  correlation  attacks  based 
on  convolutional  codes  [2]  and  on  turbo  codes  [3]. 

II.  Attack  based  on  Gallager  algorithm 
The  preprocessing  step  of  the  attack  consists  in  generating 
all  parity-check  equations  involving  d  bits  of  the  sequence 
{(7n)n<N-  They  correspond  to  all  polynomials  Q(X)P(X)  of 
weight  d  and  of  degree  at  most  N,  where  P  is  the  feedback 
polynomial  of  the  LFSR  generating  a.  The  number  of  such 
equations  involving  the  ra-th  bit  of  a  is  approximative^ 

Nd~ 1 

m{i)  ~  <J-'W  • 

1also  with  Ecole  Polytechnique  -  91128  Palaiseau  Cedex  -  France 


Using  these  parity-check  equations  we  recover  ( crn)n<N  from 
(sn)n<N  using  Gallager  soft-input/soft-output  decoding  algo¬ 
rithm  [1],  Simulations  provide  an  approximation  of  the  mini¬ 
mum  value  of  m(d)  for  convergence  of  the  decoding  algorithm: 

m{d)-cns j 

where  Cd-2(p)  is  the  capacity  of  the  binary  symmetric  channel 
with  error-probability  pd-i  =  f  (1  —  (1  —  2 p)d~2),  Ad  ~  1  if 
d  >  4  and  K3  ~  2. 

III.  Comparison  with  previous  attacks 

The  attack  presented  in  [2]  uses  a  convolutional  code  with 
memory  B.  This  code  is  defined  by  all  equations  involving  crn 
and  d  —  1  bits  of  a  outside  positions  n  —  1, . . . ,  n  —  B.  The 
number  of  such  equations  is  approximatively 


(d_  i)!2i  • 

The  attack  then  consists  in  decoding  a  sequence  r  such  that 
Pr[rn  7^  (Tn]  =  Pd- 1.  Viterbi  algorithm  then  converges  if 

-  K' 


mB{d)  < 


Cd-i(p) 


where  K'  slightly  depends  on  L  ( K '  —  3  for  L  —  21  and 
K'  =  2.5  for  L  =  40).  It  follows  that  this  attack  with  d  =  3 
achieves  the  same  performance  than  Gallager  algorithm  with 
d  =  4  only  for  high  values  of  B.  This  makes  the  decoding  step 
intractable  due  to  the  complexity  and  the  memory  require¬ 
ment  of  Viterbi  algorithm.  The  only  advantage  of  the  attack 
based  on  convolutional  codes  is  the  lower  complexity  of  the 
preprocessing  step;  but  this  part  is  performed  once  for  all. 

As  an  example,  for  L  =  40  and  N  =  400,  000,  the  maximum 
error-probability  achieved  by  our  attack  with  d  =  4  is  p  = 
0.44.  In  this  case,  the  preprocessing  step  and  the  decoding 
step  take  respectively  9  hours  and  1.5  hour  on  a  DEC  alpha 
workstation.  For  these  parameters,  the  attacks  described  in  [2] 
and  [3]  respectively  achieved  p  =  0.40  with  d  =  3  and  B  =  15 
and  p  —  0.41  with  d  =  3,  M  =  8  and  B  =  13. 
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Abstract  —  A  powerful  family  of  algorithms  for  the 
fast  correlation  attack  [1]  with  significantly  better  per¬ 
formance,  assuming  the  samfe  inputs,  than  previously 
reported  methods,  is  proposed.  The  family  is  based 
on  the  iterative  decoding  principle  in  conjunction  with 
a  novel  method  for  constructing  the  parity-checks. 

Let  On  be  the  set  of  all  considered  parity-check  equations 
related  to  the  n-th  parity  bit  of  an  ( N ,  L)  punctured  simplex 
code  ( N,L )  codeword  defined  as  follows: 

Each  parity-check  equation  is  the  mod2-sum  of  the  n-th  row 
of  the  parity-check  matrix  H  =  [Pr,I/y_i]  and  at  most  W 
other  rows,  providing  that  values  on  the  positions  i  =  B  + 
1,5  +  2, ...,  L,  are  all  zeros,  where  B  <  L  is  a  predetermined 
parameter,  and  where  the  m-th  row  of  the  matrix  Pr  is  equal 
to  the  first  row  of  the  m-th  power  of  the  LFSR  L  x  L  state 
transition  matrix  . 

Theorem  1:  For  any  (N,L)  punctured  simplex  code,  an  ap¬ 
proximation  on  the  expected  number  p  of  parity  checks  per 
parity  bit,  assuming  each  parity  check  includes  only  a  certain 
subset  from  B  fixed  bits  among  the  L  information  bits,  and 
no  more  than  W  +  1  other  check  bits,  is  given  by: 


where  0  <  B  <  L  and  1  <  W  <  L  —  B. 

In  the  following,  the  main  steps  of  the  family  of  algorithm 
are  summarized  (see  [3]  for  a  complete  description). 

1.  Hypothesis  setting 

From  the  set  of  all  possible  2s  binary  patterns  obtained 
from  the  first  B  information  bits,  select  a  previously  not 
considered  pattern  Xi,i2 b-  If  no  new  pattern  is 
available,  terminate  the  procedure. 

2.  Iterative  decoding 

Identify  the  sets  of  parity-check  equations,  n  =  L  + 
1,  L+ 2,  ...,  N* ,  and  choose  the  desired  iterative  decoding 
of  the  code  codeword  [zi,  Z2, ...,  zjv*]  using  the 

following  iterative  decoding  approaches: 

•  Bit  Flipping  (BF), 

•  A  Posteriory  Probability  (APP), 

•  Belief  Propagation  (BP), 

•  Belief  Propagation  Based  Bit  Flipping  (BP-BF). 

1This  work  was  supported  by  JSPS  Grant  RFTF  96P00604  and 
by  NSF  Grant  CCR-97-32959. 


Generate  an  estimation  of  the  codeword  parity  bits 
xl+i,xl+2,  .-,ry. 

3.  Final  correlation  check 

Using  the  sequence  xl+i  ,  ii,+2, ...,  xjv* ,  perform  infor¬ 
mation  set  decoding. 

The  performance  of  the  previous  family  of  algorithms  is  ex¬ 
perimentally  considered  when  the  LFSR  characteristic  poly¬ 
nomial  is  1  +  u  +  u3  +  u5  +  u9  +  w11  +  u12  +  u17  +  u19  +  m21  + 
u2"  +  u27  +  u29  +  u32  +  u33  +  u38  +  «40  (i.e.  assuming  the  same 
example  as  was  considered  in  [2]). 

Figure  1  depicts  the  percentage  of  error  free  information  sets 
obtained  for  different  values  of  crossover  probability  p  with 
the  employed  types  of  algorithms  and  N *  =  4096. 


Figure  1:  Percentage  of  error  free  information  sets  as 
a  function  of  the  BSC  crossover  probability  p  for  the 
(4096,40)  truncated  simplex  code  with  B  =  22  and 
W+  1  =  3. 
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Abstract  — -  We  examine  the  implications  of  us¬ 
ing  a  Low  Density  Parity  Check  Code  (LDPCC)  in 
place  of  the  usual  Goppa  code  in  McEliece’s  cryp¬ 
tosystem.  Using  a  LDPCC  allows  for  larger  block 
lengths  and  the  possibility  of  a  combined  error  cor¬ 
rection/encryption  protocol. 

I.  Introduction 

If  one  wishes  to  use  a  LDPCC  in  the  McEliece  system,  there 
are  several  ways  to  proceed.  An  efficient  way  seems  to  be  the 
following: 

As  usual,  suppose  Bob  wishes  to  send  Alice  a  secure  message 
over  an  insecure  channel.  Alice  chooses  a  random  (n  —  k)  x  n 
sparse  parity  check  matrix,  H,  for  a  binary  LDPCC,  C,  that 
admits  decoding  of  any  pattern  of  t  or  fewer  errors  with,  say, 
belief  propagation.  She  also  randomly  chooses  sparse  invert¬ 
ible  matrices  S  6  GL(k,  F2  )  and  T  £  GL(n  —  k,  F2  ).  She  then 
calculates  H  :=  TH  and  has  keys: 

Public  Key:  ( H,S,t ) 

Private  Key:  ( H,T ) 

Now,  if  Bob  wants  to  send  Alice  the  message  m,  he  first  com¬ 
putes  the  generator  matrix,  G,  for  the  code  C  in  row  reduced 
echelon  form,  and  then  computes  G  =  S~lG.  He  then  applies 
the  encryption  map: 

m  mG  +  e  =:  y 

where  e  is  a  random  error  vector  of  weight  at  most  t.  Alice’s 
decryption  procedure  is  then  as  follows:  Since  G  and  G  define 
the  same  code,  C,  she  can  use  H  to  decode  the  word  y  to 
mG  =  mS~1G.  Since  G  is  in  row  reduced  echelon  form,  this 
reveals  mS~l  in  the  k  coordinates  of  mG  in  which  G  has 
only  one  nonzero  entry  (i.e.,  the  systematic  coordinates  of  G). 
Right  multiplication  by  S  finally  recovers  Bob’s  message  m. 
This  seems  relatively  efficient  because  the  keys  consist  of 
sparse  matrices,  allowing  considerable  compression.  Hence, 
one  could  have  key  sizes  comparable  to  those  of  a  (1024,  512) 
McEliece  system,  but  for  a  code  of  size  (16384,  8192). 

II.  Security 

The  security  of  this  system  is  based  on  two  observations: 

•  If  T  is  chosen  with  the  proper  parameters,  H  will  most 
likely  not  admit  decoding  with,  s.g.  belief  propagation, 
for  the  correction  of  up  to  t  errors. 

•  It  seems  difficult  to  recover  a  matrix,  H' ,  equivalent  to 
H  that  admits  decoding  with,  e.g.  belief  propagation, 
for  the  correction  of  up  to  t  errors.  In  particular  it  seems 
difficult  to  recover  the  specific  degree  structure  of  the 
parity  check  matrix  H. 

'The  research  is  supported  in  part  by  NSF  grant  DMS-96- 10389. 


However,  a  simple  observation  shows  that  if  T  is  chosen  too 
sparsely,  this  latter  task  is  not  difficult.  In  what  follows,  if 
u  =  (ui , . . . ,  Un)  and  v  =  (vi , . . . ,  v„)  are  two  vectors  over 
F2,  u  *  v  (mvi, . . .  ,unvn)  denotes  the  intersection  of  the 
binary  vectors  u,  v.  This  is  a  vector  whose  support  is  exactly 
supp(u)  fl  supp(w).  Equivalently,  it  can  be  considered  as  the 
‘AND’  of  u  and  v. 

Let  hi,...,hn-k  denote  the  row  vectors  of  H  and 
hi, . . .  ,hn-k  the  row  vectors  of  H.  Notice  that  the  h,  are 
sparse  vectors  and  each  hj  is  a  linear  combination  of  the  h 
Furthermore,  if  T  is  sparse,  each  hj  =  hj,  +  •  •  •  +  hjw.  with 
the  Wj  small.  That  is,  each  hj  is  a  linear  combination  of  a 
small  number  of  rows  of  H.  If  the  w3  are  too  small  (i.e.,  T 
is  too  sparse),  then  with  reasonable  probability  one  has  that 
hj  *  hjm  =  hjm  for  many  of  the  1  <  j  <  n  —  k,  1  <  jm  <  ■ 

In  this  case,  since  each  hjm  appears  in  several  of  the  hj,  we 
can,  with  non-negligible  probability,  find  ji,jz  such  that 

hjl  *  hj2  —  hi 

for  some  i.  Thus,  in  time  k(k- 1)/2,  we  can  recover  some  of  the 
original  rows  of  H  by  computing  the  intersection  of  all  pairs  of 
rows,  checking  to  see  if  the  intersection  is  in  Rowsp(H).  Hav¬ 
ing  found  some  of  the  original  rows,  we  can  determine,  with 
high  probability,  which  of  the  hj  have  these  rows  as  compo¬ 
nents  in  their  linear  combinations.  We  thus  subtract  each 
original  row  from  the  hj  that  have  many  nonzero  coordinates 
in  common  with  it.  Then  go  back  to  computing  the  intersec¬ 
tion  of  all  pairs  of  rows  again,  and  keep  repeating  until  we’ve 
found  sufficiently  many  original  rows  to  allow  decoding. 

III.  Conclusion 

Empirical  evidence  has  shown  this  attack  and  some  variants 
of  it,  to  be  effective  enough  that  we  consider  this  system  in¬ 
secure  unless  T  is  chosen  to  be  dense.  Thus,  there  seems  to 
be  no  advantage  to  using  a  parity  check  matrix  as  the  public 
key.  However,  this  system  is  still  of  possible  interest  in  the 
following  case:  If  one  is  using  a  LDPCC  for  error  correction, 
some  security  can  be  added  at  very  little  extra  cost. 
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Abstract  —  This  paper  presents  a  wide  class  of  sig¬ 
nal  processing  algorithms  which  employs  a  nonlinear 
operation  in  the  time  domain  and  is  capable  of  pro¬ 
viding  good  power/bandwidth  tradeoffs  with  OFDM 
transmission.  A  suitable  analytical  approach  is  pro¬ 
posed  for  efficiently  evaluating  performances  within 
this  class  of  algorithms,  and  several  performance  re¬ 
sults  are  shown  and  discussed  in  detail. 

A  well-known,  major  drawback  of  conventional  OFDM 
schemes  (Orthogonal  Frequency  Division  Multiplexing)  is 
their  high  PMEPR  (Peak-to-Mean  Envelope  Power  Ratio), 
leading  to  amplification  difficulties:  in  order  to  avoid  the  out- 
of-band  radiation  levels  which  are  inherent  to  nonlinear  dis¬ 
tortion,  power  amplifiers  for  OFDM  transmission  are  required 
to  have  strongly  linear  characteristics  and/or  a  significant  in¬ 
put  backoff  has  to  be  adopted.  Therefore,  a  reduced  power 
efficiency  is  the  price  to  pay  for  a  high  bandwidth  efficiency. 

In  this  paper  we  propose  a  technique  to  reduce  the  en¬ 
velope  fluctuations  with  OFDM  transmission,  while  preserv¬ 
ing  low  out-of-band  radiation  levels  and  small  inband  ’’self¬ 
interference”  effects.  This  technique  is  related  to  those  pro¬ 
posed  in  [l]-[3]  but  introduces  further  flexibility.  The  sig¬ 
nals  to  be  transmitted  are  generated  as  follows:  an  aug¬ 
mented  block  (5)t;  k  =  0, 1 . . . ,  A'  —  1}  is  obtained  by  adding 
A'  —  A  zeros  to  the  data  block  {Sk',k  =  0, 1 . . . ,  A  —  1}, 
where  N'  —  NM,  for  a  selected  integer  M;  the  IDFT  of 
this  frequency-domain  block  is  computed,  leading  to  the  block 
{s'n;n  =  0,1  1};  each  time-domain  sample,  s'n,  is 

submitted  to  a  nonlinear  operation,  leading  to  the  modified 
sample  s„  =  fc(ls'n!)exp(j  arg(s'„));  a  DFT  brings  the  ”non- 
linearly  corrected”  block  back  to  the  frequency  domain,  where 
a  shaping  operation  is  performed  by  a  multiplier  bank  with 
selected  coefficients  Gk,  k  =  0, 1, . . . ,  A'  -  1,  so  as  to  obtain 
the  final  frequency-domain  block  {S1^;  k  —  0, 1, . . . ,  N'  —  1}; 
etc.  For  a  given  input  block  size  A,  a  specific  algorithm  can 
be  designed  through  the  selection  of  M,  /c(  )  and  {Gk,k  = 
0,1,...,  A7  —  1}. 

Adding  A'  — A  zeros  to  each  initial  frequency-domain  block 
and  computing  the  IDFT  of  the  augmented  block  is  equiva¬ 
lent  to  oversampling,  by  a  factor  M  =  N'/N,  the  ’’OFDM 
burst”  which  should  directly  correspond  to  the  case  where 
M  =  1.  The  nonlinear  operation  fc(-)  corresponds  to  a  band¬ 
pass  memory  less  nonlinearity,  characterized  by  an  ”AM/AM 
conversion”  function  /c(-)  and  an  ”AM/PM  conversion”  func¬ 
tion  equal  to  zero,  and  can  be  used  to  reduce  the  envelope  fluc¬ 
tuations  and  the  PMEPR  values.  The  subsequent  frequency- 
domain  operation  using  the  set  {Gk',k  =  0, 1, . . . ,  A'  —  1} 
can  provide  a  complementary  filtering  effect.  For  instance,  by 
adopting  Gk  =  1  for  the  A  ’’data  subcarriers”,  and  Gk  =  0  for 
the  remaining  A'  —  A  ones,  we  completely  eliminate  the  out- 
of-band  distortion  effects  of  the  nonlinear  function  /c(  );  how¬ 


ever,  this  leads  to  some  regrowth  of  the  envelope  fluctuations. 
A  suitable  M  >  1  reduces  the  in-band  ’’self- interference” 
which  is  due  to  the  nonlinear  distortion  inherent  to  /<?(■)  (e.g., 
an  ’’envelope  clipping”  function),  as  compared  with  that  con¬ 
cerning  M  =  1. 

Whenever  the  number  of  subcarriers  is  high,  conventional 
OFDM  signals  are  known  to  exhibit  a  Gaussian-like  nature. 
One  can  take  advantage  of  this  for  evaluating  performances 
by  analytical  means,  so  as  to  find  an  appropriate  triple  choice 
(M,  fc{-)  and  {Gk\ k  =  0, 1, . . . ,  A'  -  1})  for  any  given  appli¬ 
cation.  All  we  really  need  in  our  case  is  to  use  well-established 
results  on  bandpass  memoryless  nonlinearities  with  Gaussian 
inputs  [3,  4],  In  this  paper,  we  employ  these  results  to  de¬ 
rive  a  suitable  characterization  of  the  frequency-domain  block 
{SkF’,k  =  0, 1, . . . ,  A'  —  1}  which  replaces  the  frequency- 
domain  block  (5* ;  k  —  0, 1, . . . ,  A —  1}  of  conventional  OFDM. 

By  assuming  that  E[Sk]  =  0  and  E[SkSk>]  =  for 
k  =  k'  and  zero  otherwise,  it  is  shown  that  the  transmit¬ 
ted  frequency-domain  samples  can  be  decomposed  into  two 
uncorrelated  terms,  a  ’’useful”  term  and  a  ’’self- interference” 
term,  as  follows:  =  aS'kGk  +  DkGk,  where  a  = 

£'[lsn|/c(|s(1|)]/-E[lsn|2]-  R  is  shown  that  E[Dk ]  =  0  and 
E[DkD*k, j  —  0  for  k  7^  k' ;  when  M  —  1,  E[\Dk]2]  is  shown 
not  to  depend  on  k,  but  this  is  no  longer  true  when  M  >  1. 
Moreover,  for  any  k,  Dk  exhibits  quasi-Gaussian  character¬ 
istics  under  the  ’high  A’  assumption  mentioned  above  (say 
A  >64). 

The  characterization  of  the  transmitted  frequency-domain 
block,  including  the  appropriate  values  for  £[|Dk|2],  is  then 
used  for  both  power  spectrum  and  BER  computations.  The 
main  issue  here  is  to  evaluate  the  impact  of  the  transmitted, 
noise-like,  ’’self-interference”  on  both  bandwidth  efficiency 
and  power  efficiency.  For  this  purpose,  our  analytical  ap¬ 
proach  provides  quite  accurate  results  through  a  modest  com¬ 
putational  effort. 

A  set  of  performance  results  is  presented  and  discussed  in 
detail,  showing  that  good  power/bandwidth  tradeoffs  can  be 
achieved  within  the  proposed  class  of  algorithms. 
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Abstract  —  The  peak-to-average  power  ratio 
PAPR(C)  of  a  code  C  is  an  important  characteristic 
of  that  code  when  it  is  used  in  OFDM  communica¬ 
tions.  We  establish  bounds  on  the  region  of  achiev¬ 
able  triples  (R,  d,  PAPR(C))  where  R  is  the  code  rate 
and  d  is  the  minimum  Euclidean  distance  of  the  code. 
We  prove  a  lower  bound  on  PAPR  in  terms  of  R  and  d 
and  show  that  there  exist  asymptotically  good  codes 
whose  PAPR  is  at  most  8  log  n.  We  give  explicit  con¬ 
structions  of  error-correcting  codes  with  low  PAPR  by 
employing  bounds  for  hybrid  exponential  sums  over 
Galois  fields  and  rings. 

I.  Introduction 

A  major  barrier  to  the  widespread  acceptance  of  OFDM 
is  the  high  peak-to-average  power  ratio  (PAPR)  of  uncoded 
OFDM  signals.  By  appropriately  coding  the  OFDM  signals, 
this  PAPR  can  be  reduced.  It  is  also  possible  to  introduce 
error-correction  capability  by  using  such  a  code.  Here  we  in¬ 
vestigate  the  fundamental  trade-offs  between  the  parameters 
R,  d  and  PAPR  of  a  code  C. 

Our  codewords  c  £  C  are  complex  vectors  of  length  n  with 
||c||2  =  n.  The  OFDM  signal  corresponding  to  c  as  a  function 
of  time  t  is  the  real  part  of: 

(»—  l 

a  exp (— 2717 (/o  +  *fa)t) 
i- o 

for  0  <  t  <  j-,  where  /o  is  the  carrier  frequency  and  fs  is 
the  bandwidth  of  each  tone.  We  define  PAPR(c),  the  peak- 
to-average  power  ratio  of  the  OFDM  signal  corresponding  to 
c,  to  be 

—  max[3ft(5c(<))]2. 

n  t 

We  define  PAPR(C)  =  maxc€C(PAPR(c)). 

Statement  of  The  Problem:  What  is  the  achievable  region 
of  triples  (R,d,PAPR(C))? 

II.  Bounds  on  the  PAPR  of  codes 


Lemma  1  Let  d,  denote  the  minimum  Euclidean  distance  be¬ 
tween  the  codewords  of  C  and  the  points  of  LI  U  —fi.  Then 
dt  <  y/2n  and 

PAPR(C)  =  n{  1  -  6)2 

where  S  =  d*/2n. 

Using  the  above  lemma  together  with  a  packing  argument, 
we  can  prove  a  lower  bound  on  PAPR(C )  as  a  function  of 
R  and  d.  This  bound  is  rather  complex  to  state  and  we  do 
not  include  it  here.  We  also  have  the  following  analogue  of 
the  Gilbert-Varshamov  bound,  establishing  a  region  of  pairs 
(R,d)  in  which  asymptotically  good  sequences  of  codes  with 
low  PAPR  are  guaranteed  to  exist: 

Theorem  2  Let  R  >  0  and  A  >  0  be  such  that 

2*^(1 -|)<1. 

Then  for  all  sufficiently  large  n,  there  exists  a  code  C  of  length 
n,  rate  R  and  minimum  Euclidean  distance  d  =  \/2A n  with 
PAPR(C)  <  8  log  n. 


III.  Codes  from  exponential  sums 

We  can  explicitly  describe  families  of  codes  with  PAPR 
growth  of  order  (logn)2,  where  n  is  the  code  length.  The 
families  we  consider  are  length  n  —  2m,  2e-PSK  codes  and 
are  derived  from  special  cases  of  what  we  call  lengthened  trace 
codes.  The  lengthened  trace  codes  are  linear  over  Z 2«  and 
their  codewords  can  be  roughly  characterised  as  having  a  rep¬ 
resentation  as  the  trace  of  a  polynomial  function  evaluated  on 
a  Galois  field  (e  =  1)  or  Galois  ring  (e  >  1).  The  technique 
we  use  to  bound  PAPR  applies  to  any  code  whose  DFT  is 
uniformly  small.  We  use  bounds  for  hybrid  exponential  sums 
over  Galois  fields  and  rings  to  bound  the  DFT  coefficients  of 
the  code  families.  As  a  sample  of  our  results  we  state: 

Theorem  3  Let  Ct  be  the  length  n  =  2m  code  obtained  by 
adding  to  the  dual  of  a  primitive  t  error  correcting  BCH  code 
the  complements  of  all  codewords  and  then  an  overall  par¬ 
ity  check.  Then  any  non-constant  codeword  of  the  {+1,-1}- 
valued  version  of  Ct  has  PAPR  at  most 


We  define  the  curve  fi  CC“  by 

f2  =  {(exp(27rjXG>  •  •  •  ,  exp(27r.7'(C  An—  l)t))  :  0  <  t  <  1} 

where  (  =  fo/fs-  Typically,  (j  >>  1.  There  is  a  geometric 
interpretation  to  the  PAPR  of  a  code,  showing  that  the  closer 
a  code  lies  to  the  curve  fi  U  —  Q,  the  larger  its  PAPR: 


Similar  results  can  be  obtained  for  weighted  degree  trace  codes 
and  the  quaternary  versions  of  the  Kerdock  and  Delsarte- 
Goethals  codes  using  bounds  for  hybrid  exponential  sums  over 
Galois  rings  due  to  Shanbag,  Kumar  and  Helleseth.  None  of 
our  families  is  asymptotically  good,  however.  The  explicit 
construction  of  such  families  with  low  PAPR  remains  an  open 
problem. 
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Abstract  —  The  error  performance  of  various  8-VSB 
TCM  decoders  for  reception  of  terrestrial  digital  tele¬ 
vision  is  analyzed.  In  previous  work,  8-state  TCM  de¬ 
coders  were  proposed  and  implemented  for  terrestrial 
broadcasting  of  digital  television.  In  this  paper,  the 
performance  of  a  16-state  TCM  decoder  is  analyzed 
and  simulated.  It  is  shown  that  not  only  a  16-state 
TCM  decoder  outperforms  one  with  8-states,  but  it 
also  has  much  smaller  error  coefficients. 

I.  Introduction 

The  Digital  Television  standard  [1,2]  describes  a  broadcast¬ 
ing  system  designed  to  transmit  high  quality  video  and  audio 
as  well  aas  data  over  a  single  6  MHz  channel.  In  order  to 
maximize  service  area,  the  terrestrial  broadcast  mode  incor¬ 
porate  both  an  NTSC  rejection  filter  (in  the  receiver)  and 
trellis  coding.  When  the  NTSC  rejection  filter  is  activated  in 
the  receiver,  a  trellis  decoder  for  the  combination  of  a  four- 
state  trellis  encoder  and  the  filter  is  used.  In  the  paper,  the 
error  performance  of  various  TCM  decoders  is  studied,  with 
and  without  the  NTSC  rejection  (1-D)  filter.  In  the  previous 
results  [1,2],  a  combined  8-state  trellis  decoder  is  employed 
for  the  case  with  NTSC  rejection  filter.  We  propose  a  16-state 
TCM  decoder  and  analyze  and  simulate  its  error  performance. 
The  results  show  that  the  error  performance  improves,  with 
respect  to  8-state  TCM  decoders,  at  the  cost  of  doubling  the 
memory  requirements.  In  return,  a  16-state  TCM  decoder  has 
much  smaller  error  coefficients  and  does  not  require  precoding 
to  operate. 

II.  Encoder  model  for  ATSC  terrestrial 

BROADCASTING  OF  DIGITAL  TELEVISION 

In  the  ATSC  terrestrial  broadcasting  system  specification, 
the  8-VSB  transmission  subsystem  employs  a  rate-2/3  4-state 
Ungerboeck  trellis  code,  with  the  uncoded  bit  precoded.  The 
4-state  feedback  encoder  and  the  bits-to-8  PAM  symbol  map¬ 
per  are  shown  in  Fig.  1  (a).  The  NTSC  interference  rejection 
(comb)  filter  is  a  one  tap  linear-feed-forward  (1-D)  filter.  Its 
purpose  is  to  reduce  the  analog  NTSC  interference  that  is 
caused  by  a  carrier  tone.  However,  the  received  signals  are 
also  modified.  The  8  signed  levels  are  converted  to  15  lev¬ 
els.  While  providing  needed  co-channel  interference  benefits, 
it  is  well-known  that  the  (1-D)  filter  degrades  white  noise  per¬ 
formance  by  3  dB.  This  is  because  the  filter  output  is  the 
subtraction  of  two  full  gain  paths  and,  as  white  noise  is  un¬ 
correlated  from  symbol  to  symbol,  the  noise  power  doubles. 
There  is  an  additional  0.3  dB  degradation  due  to  error  prop¬ 
agation  introduced  by  precoding. 

III.  Approximated  error  performance  analysis 

1This  work  was  supported  by  LSI  Logic  Corp. 


In  the  approximations  presented  in  this  section,  we  inter¬ 
pret  the  trellis  code  with  decoding  depth  k,  as  a  terminated 
zero-tail  (ZT)  (3k,  2k  -  m)  block  code  [3].  (For  the  four-state 
trellis  decoder,  m  =  2,  while  for  8-  and  16-state  trellis  de¬ 
coders,  m  =  3  and  m  =  4,  respectively.)  When  plotting 
the  expressions  with  respect  to  the  energy  per  bit-to-noise 
ratio  ( Eb/No ),  a  rate  correction  of  Rl  (dB)  is  applied,  to 
account  for  the  rate  loss  due  to  trellis  termination,  where 
Rl  =  (2k  —  m)/2k.  With  the  NTSC  interference  rejection 
filter  in  the  receiver,  the  number  of  states  in  the  decoder  in¬ 
creases,  due  to  a  signal  constellation  of  increased  dimension¬ 
ality.  In  the  guide  to  the  ATSC  system[2],  a  (1-D)  filter  is 
recommended  that  increases  the  number  of  signal  levels  from 
8  of  the  original  8-PAM  constellation  to  15  at  the  output  of 
the  filter.  To  analyze  the  performance  of  the  eight-state  trel¬ 
lis  decoder,  a  truncated  union  bound  is  computed  using  the 
technique  of  [3]  as  follows.  A  polynomial  state  transition  ma- 
triz  TOA)  for  the  8-state  trellis  is  used,  with  branch  weights 
equal  to  Ad,  where  d  denotes  the  squared  Euclidean  distance 
(SED)  with  respect  to  the  all-zero  branch  and  X  is  an  inde¬ 
terminant.  For  each  set  of  three  parallel  branches  between 
two  states  t ,j,  an  element  Kij(X)  in  matrix  II  is  a  poly¬ 
nomial  7r,-,y(A)  =  Adl  +  Ada  -(-  Xd3,  where  dj,j  =  1,2,3, 
denotes  the  SED  from  the  branch  output  to  the  all-zero  se¬ 
quence  output.  Using  symbolic  mathematical  software,  the 
value  of  the  k-th  power  11*  (A)  is  computed  and  the  coef¬ 
ficients  of  ?To*o(A)  yield  the  weight  distribution  of  the  ZT 
(3k,  (2k  —  3))  block  code.  For  the  8-state  trellis  decoder,  with 
k  =  54,  7r<84)(A)  =  5696A56  +  1520A48  +  404A40  and  for  16- 
state  trellis  decoder,  ir^ (A)  =  840A58  -f  248A48  +  101A40. 
As  shown  above,  the  error  coefficients  for  the  MSED  for  the 
16-state  trellis  decoder  is  much  smaller  than  that  of  8-state 
trellis  decoder. 

IV.  Conclusion 

The  simulation  and  approximated  error  performance  shows 
that  a  TCM  decoder  with  16-states  outperforms  one  with  8- 
state  by  approximately  0.33  dB  at  a  BER  of  10-s.  Finally, 
while  the  8-state  decoder  must  use  a  precoder  for  the  uncoded 
bit  to  be  able  to  decode  properly,  the  proposed  16-state  de¬ 
coder  does  not  and  has  a  better  error  performance. 
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Abstract  —  Channel  capacity  of  OFDM  systems 
with  digital  clipping  is  discussed,  under  the  as¬ 
sumption  that  the  distortion  terms  are  Gaussian- 
distributed. 

I.  Introduction. 

One  of  the  major  problems  in  the  orthogonal  frequency 
division  multiplexing  (OFDM)  technique  is  the  high  peak-to- 
average  power  ratio  (PAPR)  property  of  the  signal  waveform, 
and  digital  clipping  may  be  employed  in  order  to  reduce  the 
PAPR  before  amplification.  The  performance  of  the  clipping 
in  terms  of  the  PAPR  reduction  capability  and  the  degrada¬ 
tion  in  the  bit  error  rate  is  discussed  in  e.g.,  [1-3].  Without 
coding,  clipping  may  be  a  severe  source  of  the  performance 
degradation.  However,  in  most  applications  of  the  OFDM, 
channel  coding  may  be  applied  so  sis  to  reduce  the  required 
energy  to  achieve  the  targeted  bit  error  performance.  There¬ 
fore,  it  is  important  to  study  the  performance  of  the  clipped 
coded  OFDM  signals.  In  this  paper,  the  channel  capacity  of 
the  clipped  OFDM  signals  is  derived  and  evaluated. 

II.  System  Model 

Let  N  be  the  number  of  subcarriers  and  J  be  an  over- 
sampling  factor  of  the  OFDM  signals  before  clipping.  Thus, 
NJ- point  ID  FT  will  be  used  to  construct  the  OFDM  signals. 
In  order  to  efficiently  reduce  the-PAPR  of  the  OFDM  signals, 
the  clipping  should  be  performed  with  oversampling  [1,3].  We 
assume  that  the  clipping  is  followed  by  the  rectangular  fil¬ 
ter  such  that  the  OFDM  signal  is  tightly  band-limited.  As 
a  clipping  model,  soft  envelope  limiter  is  considered,  where 
the  clipping  ratio  7  is  defined  as  7  =  A  max  /VPina‘  ^th 
Amaz  and  P/°tai  being  the  maximum  permissible  amplitude 
and  the  average  power  of  the  OFDM  signal  before  clipping, 
respectively. 

III.  Channel  Capacity 

By  the  central  limit  theorem,  the  distribution  of  distortion 
components  of  the  clipped  OFDM  signal  can  be  shown  to  ap¬ 
proach  Gaussian  for  large  N .  Let  PItl[fc]  and  P0ut,s[fc]  denote 
the  average  (useful)  signal  power  of  the  fcth  subcarrier  before 
and  after  clipping,  respectively.  Let  Pd[k\  also  denote  the  av¬ 
erage  distortion  power  of  the  kth  subcarrier.  Then,  without 
any  constraint  in  the  input  signal  except  the  total  input  power 
Y  Pin[k]  =  Plnta\  the  average  channel  capacity  per  subcar¬ 
rier  will  be  given  by 

N- 1 

C  =  max  V  log2(l  +  SNDRfc)  bits/subcarrier.  (1) 
N 

1This  work  was  supported  in  part  by  the  Research  Fellowship 
from  the  Japan  Society  for  the  Promotion  of  Science  for  Young 
Scientists. 


The  inverse  of  the  signal-to-noise-plus-distortion  ratio  of  the 
fcth  subcarrier  is  given  by 

SNDRT1  =  P““'  \ 


PautAk\  NSNRc 


1  (  Pin 

NRc  \  Pin 


where  Pdtota!  is  the  total  average  distortion  power  and  SNRC  = 
Poutal / Pnoise  is  the  channel  signal-to-noise  ratio,  with  •Pj2[af 
and  Pnoi,e  being  the  total  power  of  the  output  signal  and 
AWGN  at  the  receiver,  respectively.  Note  that  since  the  rect¬ 
angular  filter  is  employed,  Pj°[al  =  Yk=o  +  Pjotal 

and  Pyta‘  =  EL'o1  P*H 

Since  Pd[k]  is  a  function  of  all  the  Pin [fc] ,  the  maximization 
of  (1)  seems  quite  involved.  Therefore,  we  further  assume 
the  constraint  that  the  power  allocation  of  each  subcarrier  is 
equal,  i.e.,  P;n[A;]  =  P-°tal/N  for  all  k.  Then,  (2)  reduces  to 

SNDRt  1  =  T^rrr-  +  gNRc  j1  +  jy  SDRfc  j  ^ 


where  the  signal-to-distortion  ratio  of  the  fcth  subcarrier  is  de¬ 
fined  as  SDRfc  =  P°pj[fc|fc'  1  which  can  be  easily  calculated  as 
a  function  of  7  by  use  of  infinite  series  expansion  of  the  au¬ 
tocorrelation  function  of  the  input  signal  [4].  As  a  numerical 
example,  Fig.  1  shows  the  asymptotic  value  of  the  average 
capacity  for  SNRc  -»  00  without  constraint  on  the  input  sig¬ 
naling  and  N  =  512.  The  derivation  of  channel  capacity  of 
other  cases  such  as  QPSK  or  16QAM  signaling  is  straightfor¬ 
ward. 

10 1 — 1 — 1 — 1 — 1 — j — ' — 1 — < — 1 — 1 — 1  1  1  1  1  1  1  1  r~7i 


Clipping  Ratio  y 

Fig.l  Asymptotic  average  channel  capacity,  N  =  512. 

(The  case  7  =  0  corresponds  to  hard  envelope  limiter.) 
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This  paper  describes  a  construction  technique  for  q-  for  i  =  2,  3,  and  4  as  described  in  [1].  Some  of  the 

ary  Turbo  Codes  that  computes  good  recursive  associated  weight  spectra  as  determined  by  the 

systematic  convolutional  q-ary  constituent  codes  with  algorithm  are  shown  in  Tables  1  and  2. 

constraint  length  v  <  5  for  q  =  2m,  m  =  2,  3,  and  4. 

The  construction  technique,  based  on  the  algorithm  in 
[1],  determines  the  codes  with  maximum  dj  for  i  =  2, 

3,  and  4  and  minimum  codeword  multiplicity,  where 
d,  is  the  minimum  weight  of  all  code  sequences  with 
input  weight  i.  Due  to  the  large  number  of  encoder 
states  involved,  standard  weight  distribution 
calculations  are  difficult.  The  construction  algorithm 
employed  is  a  computer  search  that  generates  all 
possible  terminating  sequences  of  weight  2,  3,  and  4 
to  use  as  inputs  to  the  set  of  allowable  encoders.  The 
best  codes  with  maximum  d,  and  minimum 
multiplicity  are  determined.  The  performance  of 
these  Turbo  codes  using  M-ary  (M  =  4,  8,  16)  non¬ 
coherent  modulation  (FSK)  is  computed  by 
determining  performance  bounds  assuming  the 
parallel  concatenation  of  two  constituent  codes  [2]. 

M-ary  FSK  with  q-ary  Turbo  Coding  can  provide  an 
efficient  modulation/coding  solution.  The  emphasis 
is  on  matching  the  modulation  alphabet  with  the 
Turbo  coding  alphabet  to  implement  simple 
modulation  /  coding  approaches  that  perform  at  a 
lower  required  E(/N0  and  greater  bandwidth 
efficiency  than  current  coding  approaches  using  these  ^  Benedetto,  R.  Garello,  and  G.  Montorsi,  “A 

modulation  techniques.  Search  for  Good  Convolutional  Codes  to  be  Used  in 

the  Construction  of  Turbo  Codes,”  IEEE  Trans. 

The  code  construction  algorithm  consists  of  searching  Commun.,  vol.  46,  pp.l  101-1 105,  September  1998. 

all  possible  feedback  polynomials  of  the  form 

H(D)=hmDra+hm. | Dm  l+  .  .  •  +  h,D+l  to  determine  a  W  S.  Benedetto,  and  G.  Montorsi,  “Unveiling  Turbo 

maximal  length  generator.  This  implies  that  the  Codes:  Some  Results  on  Parallel  Concatenated 

polynomial  is  primitive.  Once  a  primitive  polynomial  Coding  Schemes,  IEEE  Trans.  Information  Theory., 

H(D)  is  determined,  the  minimum  parity  weight  p,  vol.  42,  pp.409-428,  March  1996. 

and  codeword  multiplicity  Nj  corresponding  to  input 

weights  i  =  2,  3,  and  4  are  computed.  The  p,  and  N,  This  work  was  supported  in  part  by  NSF  grant 

are  computed  by  generating  all  possible  terminating  NCR95-22939,  NASA  grant  NAG5-8355,  and 

weight  2,  3,  and  4  sequences  to  use  as  inputs  into  the  Raytheon  Systems  Company. 

encoder  for  all  feedforward  polynomials 

G(D)=gmDm+gm.1Dm-'+  .  .  .  +g,D+l.  The 

construction  algorithm  determines  all  polynomials 

G(D)  and  H(D)  with  maximum  pi  and  minimum  Nj 


Table  2:  Partial  Weight  Spectrum  of  8-ary  R=l/2 
_  Constituent  Codes 


V 

#of 

states 

P2,N2 

P3,N3 

P4,N4 

1 

8 

2,7 

2,7 

2,7 

2 

64 

10,7 

3,7 

3,14 

3 

512 

66,7 

6,7 

4,7 

4 

4096 

512,7 

5,7 

6,7 

5 

32768 

4097,  7 

52,  7 

3,7 

Table  1:  Partial  Weight  Spectrum  of  4-ary  R=l/2 
Constituent  Codes 
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Abstract  —  The  design  of  multilevel  superposition 
coded  modulation  schemes  using  turbo  codes  is  ad¬ 
dressed.  A  multistage  decoding  algorithm  with  re¬ 
duced  decoding  delay  is  proposed.  Simulation  results 
over  an  impulsive  noise  channel  are  obtained. 

Recently  a  superposition  coded  modulation  scheme  (SCM) 
for  transmission  over  a  broadcast  AWGN  channel  was  pro¬ 
posed  in  [1].  The  scheme  uses  a  nonuniformly  spaced  32-QAM 
signal  constellation  and  multilevel  coding  associated  with  a 
nonstandard  partition  [2],  The  QAM  constellation  consists  of 
four  8-PSK  subconstellations.  The  proposed  scheme  has  two 
information  classes,  class  A  (the  more  important)  and  class 
B  (the  less  important).  In  the  multilevel  construction,  class 
A  is  associated  with  the  first  two  levels  (1  and  2)  and  class 
B  is  associated  with  the  other  levels  (3,4  and  5),  i.  e.,  with 
the  8-PSK  subconstellation.  The  nonstandard  partition  al¬ 
lows  parallel  decoding  of  levels  1,  2,  3  and  4.  This  important 
feature  encourages  the  use  of  turbo  codes  as  component  codes 
of  the  multilevel  construction  in  applications  where  there  is  a 
delay  constraint.  For  low-to-medium  bit  error  rates,  the  per¬ 
formance  of  class  B  can  be  significantly  improved  by  using 
a  standard  partition  associated  with  the  8-PSK  subconstella¬ 
tion.  On  the  other  hand,  the  standard  partition  eliminates 
the  decoding  parallelism  of  levels  3  and  4.  To  circumvent 
this  problem  we  propose  a  new  multistage  decoding  (MSD) 
algorithm  which  partially  recovers  this  parallelism.  Consider 
a  standard  MSD  (SMSD)  algorithm  where  each  component 
code  is  decoded  with  three  iterations.  We  propose  to  use  the 
decoding  structure  shown  in  Fig.  1  instead  of  SMSD  for  de¬ 
coding  the  component  codes  of  levels  3,4  and  5  (class  B).  After 
the  first  iteration  in  the  third  stage  the  subset  information  is 
passed  to  the  fourth  stage  and  the  first  iteration  in  this  stage 
begins.  In  the  same  way,  after  this  first  iteration  the  subset 
information  is  passed  to  the  fifth  stage  and  the  first  iteration 
in  this  stage  begins.  The  remaining  iterations  in  each  stage 
are  now  done  in  parallel.  If  D  is  the  delay  of  each  iteration, 
the  total  delay  of  the  new  algorithm  is  5D  while  a  SMSD  al¬ 
gorithm  has  a  total  delay  of  9D.  In  Fig.  1  the  dashed  boxes 
indicate  additional  iterations  that  could  be  done  in  the  third 
and  fourth  stages  during  the  same  decoding  interval.  The  de¬ 
coding  structure  described  in  Fig.  1  can  be  easily  generalized 
for  cases  where  number  of  iterations  in  each  stage  is  greater 
than  three  and/or  the  subset  information  can  be  passed  to  the 
subsequent  stage  after  more  than  one  iteration. 

Fig.  2  shows  simulation  results  for  class  B  over  an  impul¬ 
sive  noise  channel  with  hit  probability  and  impulsive-to-noise 
power  ratio  equal  to  0.1  and  10,  respectively.  The  superposi¬ 
tion  gain  of  the  SCM  approach  [3]  was  obtained  for  this  impul¬ 
sive  noise  channel  based  on  a  cutoff  rate  parameter  and  it  jus¬ 
tifies  the  approach’s  choice.  In  each  partition  level  the  turbo 


°This  work  was  supported  partially  by  Fundagao  dc  Amparo  a 
Pcsquisa  do  Est.ado  de  Sao  Paulo  under  Grant  97/09347-9. 


encoder  consists  of  a  parallel  concatenation  of  two  identically 
punctured  16-state  RSC  encoders  with  rates  1/2.  The  result¬ 
ing  rate  distribution  for  class  B  levels  is  R3  =  0.44,  FU  =  0.70 
and  R$  =  0.88.  Each  turbo  encoder  uses  a  pseudo-random 
interleaver  of  size  N  =  400.  All  the  results  were  obtained 
for  MSD  algorithms  with  6  iterations  per  stage.  The  results 
for  the  new  MSD  (NMSD)  algorithm  are  for  the  case  of  1 
(NMSD-1)  and  2  (NMSD-2)  initial  iterations  before  passing 
information  to  the  next  stage.  For  a  bit  error  rate  (BER) 
equal  to  10-3,  the  performance  degradation  of  NMSD-1  algo¬ 
rithm  (delay=8D)  relative  to  SMSD  algorithm  (delay=18D) 
is  about  0.17  dB.  For  BER  <  10-3,  the  performance  degrada¬ 
tion  of  NMSD-2  is  negligible  (delay=10D).  Fig.  2  also  shows 
the  performance  of  class  B  with  parallel  multistage  decoding 
(PMSD)  of  stages  3  and  4  (one  initial  iteration  before  passing 
information  to  the  fifth  stage:  delay=7D)  which  has  0.3  dB 
degradation  relative  to  NMSD-1  at  BER=  10-3.  Therefore, 
the  new  algorithm  has  an  excellent  trade-off  between  perfor¬ 
mance  and  decoding  delay. 


Figure  1:  The  proposed  multistage  decoding  algorithm. 


Figure  2:  Performance  of  the  new  algorithm. 
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Abstract  —  We  design  parallel  concatenated  trel¬ 
lis  coded  modulation  (PC-TCM)  schemes  at  1 
bit/sec/Hz  for  8-PSK  and  8-QAM  over  the  following 
discrete  two-dimensional  (2D)  channels:  (a)  a  slow- 
fading  Rayleigh  channel  with  discrete  carrier  tracking 
by  a  phase  locked  loop  (PLL),  where  the  PLL  signal- 
to-noise  ratio  (SNR)  is  proportional  to  the  fading  am¬ 
plitude  squared;  (b)  an  additive  white  Gaussian  noise 
(AWGN)  channel  with  a  PLL;  and  (c)  a  fast-fading 
Rician  channel  with  carrier  phase  estimation  for  the 
line-of-sight  (LOS)  path  only.  The  fading  gain  and 
phase  error  are  assumed  independent  over  successive 
symbols.  The  codes  for  channels  (a)  and  (b)  perform 
within  1  dB  of  constellation-constrained  capacity  at 
bit  error  rates  of  10-6,  while  those  for  channel  (c) 
perform  within  1.2  dB  of  constellation-constrained  ca¬ 
pacity  for  LOS-to-diffuse  power  ratios  of  3  dB. 

I.  Introduction 

This  work  extends  previous  results  on  PC-TCM  for  AWGN 
and  fading  channels  [1,  2],  to  the  case  of  partially  coher¬ 
ent  fading  channels.  We  consider  two  important  causes  for 
partial  coherence:  PLL  phase  error  on  slow-fading  channels, 
and  low  LOS-to-diffuse  power  ratios  on  fast- fading  channels. 
Equiprobable  signaling  is  assumed  throughout  the  paper. 

We  consider  the  discrete-time  channel  model  for  a  correla¬ 
tor  receiver  with  PLL 

Y  =  AX  exp(j<t>)  +  N,  (1) 

where  X  and  Y  are  the  complex  channel  input  and  output,  A 
is  the  (real-valued)  fading  gain  with  E[A2]  =  1,  4>  is  the  phase 
error,  and  N  is  0  mean  complex  AWGN  with  i.i.d.  component 
variances  equal  to  No/2.  A  and  <p  are  independent  of  N.  For 
channel  (a),  A  is  Rayleigh  distributed  and  is  known  at  the 
receiver.  For  channel  (b),  A  =  1.  For  (a)  and  (b),  <p  has 
conditional  Tikhonov  PDF  p{<f>\A)  =  exp[pcos(i/>)]/(27r/o(p)), 
where  loop  SNR  p  =  {Es / N0)(aA2) / (2B lT)  is  a  function  of 
average  signal  energy  E„,  discrete  carrier  power  fraction  a, 
PLL  bandwidth  Bl,  and  symbol  interval  T.  For  channel  (c), 
A  is  Rician  distributed  with  LOS-to-diffuse  power  ratio  /?, 
A  is  unknown  at  the  receiver,  and  <p  has  the  angular  PDF 
corresponding  to  the  Rician  amplitude  PDF. 

II.  Code  Design 

Our  codes  use  the  bit-interleaved  architecture  of  [1],  with 
an  additional  x2  signal  expansion  for  additional  gain  on  fad¬ 
ing  channels.  Our  1  bit/sec/Hz  rate  2/6  PC-TCM  consists 
of  two  16-state  rate  2/4  systematic  convolutional  encoders, 
each  of  which  punctures  one  of  the  two  systematic  inputs  for 

1  This  work  was  partially  supported  by  the  Spokane  Intercolle¬ 
giate  Research  and  Technology  Institute,  grant  number  Y923112. 


Rayleigh  Wilh  Tikhonov  Phase  Channel:  bl_T=0.1,  length  16384.  S32  and  S40  interleavers. 
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Fig.  1: 

an  effective  rate  of  2/3.  Each  encoder  addresses  one  8-point 
constellation. 

We  search  over  linear  Ungerboeck  encoders  to  maximize 
d\g,  the  minimum  squared  Euclidean  distance  (s.E.d.)  be¬ 
tween  symbol  sequences  corresponding  to  input  sequences  dif¬ 
fering  by  weight  two.  We  choose  one  encoder  that  provides  a 
high  and  nearly  equal  dig  for  all  constellations  and  mappings 
studied.  We  prove  that  s.E.d.  is  in  fact  the  maximum  likeli¬ 
hood  code  design  metric  for  channel  (c).  For  channels  (a)  and 
(b),  the  selected  codes  give  very  good  performance  despite  the 
fact  that  s.E.d.  is  not  the  optimal  design  metric. 

III.  Simulation  Results 

Figure  1  presents  8-iteration  turbo-decoding  simulations  for 
channel  (a),  with  the  product  BiT  —  0.1.  The  MAP  al¬ 
gorithm  uses  the  closed  form  conditional  channel  PDF.  We 
tested  two  bit  mappings  for  the  two-radius  8-QAM:  the  first 
mapped  the  systematic  bit  to  the  two  radii,  and  the  second 
attempted  to  minimize  bit  transitions  between  nearest  neigh¬ 
bors.  The  radial  mapping  is  best  at  BlT  —  0.1,  but  shows  no 
advantage  at  BlT  —  0.01.  The  best  case  mappings  perform 
within  1  dB  of  constellation  constrained  capacity  for  both  8- 
QAM  and  8-PSK  in  all  simulated  cases. 
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Abstract  -  In  this  paper,  carrier  phase  recovery  of  turbo 
coded  modulation  systems  in  the  AWGN  channel  is  studied.  A 
decision-directed  (DD)  and  non-data-aided  (NDA)  synchroni¬ 
zation  structures  are  considered.  Simulation  results  are  shown  to 
illustrate  performance  of  turbo  coded  BPSK  and  QPSK 
employing  these  synchronizers. 

I.  Introduction  and  Problem  Statement 

Introduced  in  1993  parallel  concatenated  convolutional  codes, 
termed  turbo  codes  [1],  are  very  powerful  error  correction  technique 
which  outperforms  all  previously  known  coding  schemes.  Turbo 
codes  are  capable  of  operating  very  close  to  the  Shannon  limit  on 
AWGN  channels  with  a  reasonable  encoding  and  decoding 
complexity.  Decoding  algorithm  works  in  an  iterative  manner, 
decoding  the  constituent  codes  of  the  turbo  code  separately  and 
passing  the  symbol-likelihood  information  from  one  decoder  to  the 
other. 

Coherent  demodulation  of  carrier-modulated  signals  requires 
precise  knowledge  of  signal  frequency,  phase  and  symbol  period. 
Good  estimation  of  these  parameters  by  synchronization  circuits  in 
the  receiver  is  essential  for  an  overall  system  performance.  For  turbo 
coded  signals  the  synchronization  problem  becomes  especially 
difficult  since  turbo  codes  operate  in  the  region  of  low  SNR. 
Consequently,  a  question  arises  whether  adequate  synchronization 
can  be  obtained  from  the  modulated  signal  itself  and  how  much 
performance  degradation  one  may  expect  with  practical 
synchronization  schemes  compared  to  perfect  synchronization  case. 

Decision-directed  ML  joint  phase  and  timing  synchronization 
for  turbo  codes  has  been  studied  in  [2].  In  this  paper,  we  focus  our 
attention  on  carrier  phase  recovery  for  turbo  coded  BPSK  and 
QPSK  schemes.  We  consider  both  decision-directed  (DD)  and  non- 
data-aided  (NDA)  estimation  schemes.  In  the  communication  system 
we  analyze,  the  binary  data  stream  is  first  differentially  encoded  and 
then,  in  frames  of  length  N,  passed  to  the  turbo  encoder.  The  output 
of  the  turbo  encoder  is  fed  to  a  BPSK  or  QPSK  modulator.  After 
transmission  in  an  AWGN  channel,  the  received  signal  is  first 
downconverted  to  baseband  and  then  carrier  phase  estimation 
followed  by  phase  rotation  are  performed.  Channel  observations  are 
then  passed  to  the  turbo  decoder  and  finally,  its  outputs  are 
differentially  decoded  to  obtain  the  estimates  of  sent  data  bits.  To 
solve  the  problem  of  ambiguity  in  the  phase  estimate,  rotationally 
invariant  turbo  code  is  used. 


II.  DD  and  NDA  Phase  Recovery  Structures 

The  DD  synchronizer  we  examined  is  shown  in  Figure  1.  In  this 
structure,  based  on  received  samples  rk  rotated  in  phase  according  to 
obtained  phase  estimates,  tentative  decisions  on  data  are  made.  The 
tentative  decisions  are  used  then  to  remove  data  from  the  received 
signal.  Phase  estimates  6  are  obtained  from  samples  zk  in  the  phase 
estimator  according  to  the  ML  rule: 


9  =  tan  1 


(1) 


where  L  is  the  window  length  of  the  estimator.  Phase  estimates  0 
are  used  then  to  multiply  the  input  samples  rk .  The  resultant  samples 
yk  are  passed  to  a  turbo  decoder. 

It  is  known  that  for  Af-PSK  signals  at  low  SNR  when  no  reliable 
data  estimates  exist,  NDA  (i.e.  data-independent)  approach  to  carrier 
phase  recovery  can  be  employed.  As  a  NDA  carrier  phase  estimation 
scheme  we  investigated  the  Viterbi  and  Viterbi  (V&V)  synchronizer 
[3],  It  is  an  efficient  feedforward  phase  tracking  scheme  for  Af-PSK 
modulation,  which  uses  nonlinear  function  that  can  be  optimized  to 
minimize  phase  error  variance.  The  carrier  phase  estimate  in  our 
V&V  synchronizer  is  calculated  as: 

0  =  arg(Zk  f '  exPO'M  afg('V ))  j 

where  K  is  the  duration  of  the  observation  window  measured  in 
symbol  intervals. 

III.  Numerical  Results 

For  performance  analysis  of  considered  synchronizers  in  turbo 
systems,  computer  simulations  were  performed.  We  examined  rate- 
1/2  turbo  code  with  a  16-state  (37/3 1  )8  RSC  encoder  and  block 
lengths  of  256,  1024  and  16384  bits.  Window  lengths  of  the 
estimators  were  selected  following  the  results  of  cycle  slipping 
analysis.  Phase  estimators  with  windows  of  size  30,  60  and  80 
samples  were  studied. 

The  simulation  results  show  that  for  both  modulation  schemes 
the  DD  synchronizer  performs  better  than  V&V  with  the  same 
window  size.  However,  energy  loss  for  BPSK  is  of  only  0.1 -0.2  dB. 
For  QPSK  we  observe  larger  SNR  penalty  -  0.5-0. 7  dB.  Comparing 
achieved  performances  with  those  of  perfectly  synchronized 
schemes  it  is  seen  that  the  turbo  decoder  with  the  DD  scheme 
exhibits  the  loss  of  about  0. 1-0.5  dB  at.  BER=10'4 ,  depending  on  the 
window  size  and  block  length. 
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Fig.  1 .  Decision-directed  synchronizer:  PE  -  phase  estimator. 
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Abstract  —  We  investigate  the  main  parameter 
in  the  function  of  undetected  error  probability  for 
shortened  binary  cyclic  codes  -  the  number  of  mini¬ 
mum  weight  words  Ad-  Method  for  calculations  is  pre¬ 
sented. 


I.  Introduction 

Let  C  be  an  [ n,k,d }  binary  cyclic  code,  generated  by  the 
polynomial  ^(a;),deg(g)  =  p  =  n  —  k.  Every  codeword 
c(x)  €  C  can  be  represented  as  xp  f(x)  +  xp  f(x)(modg(x)), 

where  f(x)  =  fn-p-ixn~p~'1  +/n_p_2:rn~p~2  + - \~fix  +  fo- 

Let  {Aj}"=0  be  the  weight  distributions  of  C.  A  CRC  code 
D[n  —  nc,  k  —  nc]  is  obtained  from  C  by  shortening  in  the  first 
nc  positions.  Let  us  denote  by  A^c  the  number  of  the  words  of 
minimum  weight  for  D[n  —  nc,  k  —  nc}.  For  a  BSC  let  us  denote 
by  e  £  [0,1/2]  the  channel  error  rate.  Then  the  probability 
of  undetected  error  for  C  is  Pud(C,e)  =  Yli=d  Ai£*(l  —  £)"”'• 
For  low  values  of  e  the  most  important  parameter  of  the  func¬ 
tion  Pud{C,e)  is  Ad-  We  present  a  method  for  fast  calculation 
of  the  value  of  Ad-  This  will  allow  us  to  investigate  the  per¬ 
formance  of  several  standards  for  n  —  nc  >  1000.  Previous 
algorithms  for  calculation  of  PUd  are  due  to  Fujiwara-Kasami- 
Kitai-Lin  [2]  and  Castagnoli-Brauer-Herrmann  [1], 

II.  The  Method  for  General  Case 

Let  n  —  ord(g(x)).  We  have  D[n  —  1,  k  —  1]  =  C  fl  {c  : 
/n-p-i  =  0},  D[n  —  2,  fc  —  2]  =  CTl{c  :  /n-p-i  =  /„-P- 1  —  0} 
and  so  on. 

Definition  1  Denote  Cd  —  {c  :  c(x)  —  xp f(x)  + 
xp f  (x){modg{x))  £  C,wt(c)  =  d},  Q(i)  =  #{c( x)  :  c(x)  € 
Cd,  fi  =  1},  Qj(i)  =  #{c(z)  :  c(x)  £  Cd,fi  =  }j  =  1}. 

As  C[n ,  fc]  is  a  cyclic  code  we  have  Q  =  Q( 0)  =  •  •  •  = 
Q(n-p—  1)  =  dAd/n.  It  is  clear  that  Qj  =  Qj(0)  =  Qj+ i(l)  = 
•  •  •  =  Qi+n-P~i(n  -  p  -  1)  .  Let  S  C  {1, , . .  ,n  -p  -  1}. 

Definition  2  Qs(i)  =  #{c(z)  :  c(x)  €  Cd,  fi  —  fj  =  1  for  all 

3  G  S}. 

We  have  Qs  =  Qs( 0)  =  Qs+i(l)  =  Qs+2(2)  =  ...  and 
Ad  —  Ad  —  2Q  +  Qi;  A^  —  Ad  —  3 Q  +  2Qi  +  Q2  —  Q  1,2- 
Counting  by  the  inclusion-exclusion  principle,  we  obtain: 


nc -d+1 


+  (_1)d  ^  ^  Qi2-*1 . id-h- 


'd=id-  1+1 


III.  The  Method  for  Hamming  Code 

Let  g(x)  be  a  primitive  polynomial  which  generates  a  Ham¬ 
ming  code  C[n,  k],  n  —  2P  —  1.  We  have  Q\  =  •  •  •  =  Qn-P- 1  — 
1  and  Q  =  2P_1  —  1.  By  definition,  we  have  Qm,j  =  1  iff 
g(x) \xm  +  x-’  -F  1  and  Qm,j  =  0  otherwise.  Consequently  Qm,j 
depends  on  g(x)  and  we  also  write  Qm,j(g )■  For  the  Hamming 
case  Theorem  3  has  a  following  form: 


Theorem  4  For  binary  shortened  Hamming  codes  with  n  — 
2P  —  1,  k  —  n  —  p  we  have 


A 


nc 

3 


nc(nc  -  1) 
2 


Tic — 2  fig  — 1 

-  53  53  (nc  _ 

j'=l  m=2+l 


IV.  Algorithm  for  Computing 

Let  g(x)  be  a  primitive  polynomial  of  degree  p. 

Definition  5  Let  us  denote  gt{x)  =  gcd (g(xt),xn  —  1),  for 
gcd(t,n)  =  1. 

Lemma  6  If(t,n)  =  1  then  Qi,j(g)  =  Qit(modn),it(,modn){gt)- 

The  polynomial  gt(x)  is  also  primitive  of  degree  p.  We 
calculate  Qm,j  for  g(x).  In  fact,  gt2i  (x),  l  =  0, 1, ... ,  are  .the 
same  and  we  can  group  the  coefficients  t  into  a  few  sets.  For 
each  such  set,  we  calculate  the  values  of  Qu(modn) ,jt(modn){gt) 
and  AdC(gt)-  It  remains  to  mark  the  minimum  values  of  A^c. 

V.  Practical  Realization  for  p  =  16 

For  d  =  3  there  are  8  classes  of  polynomials  with  orders 
between  32768  and  65535.  For  each  of  them  we  calculate  the 
values  of  A^c  between  32768  and  its  order  for  all  sets  of  the 
class  and  find  the  best  results. 

Similar  procedures  were  developed  for  d  =  4. 


Theorem  3  For  a  binary  CRC  code  D[n  —  nc,n  —  p  —  nc] 
generated  by  g(x),  ord(g)  =  n  and  1  <  nc  <  n  we  have 

nc  1  xi  c 

Adc  =  Ad-  ncQ  +  53  5Z  ~  "  ’ 

t=l  2  =  *+l 
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Abstract  —  The  knowledge  on  the  weight  distribu¬ 
tion  of  the  coset  leaders  for  a  given  binary  linear  code 
is  important  for  the  evaluation  of  error  performance 
of  the  code.  An  algorithm  is  proposed  for  computing 
the  weight  distribution  of  the  coset  leaders.  Using  this 
algorithm,  we  have  computed  the  weight  distribution 
of  the  coset  leaders  of  ( N,K )  Reed- Muller  codes,  bi¬ 
nary  primitive  BCH  codes,  and  their  extended  codes 
with  N  <  128  and  N-K  <  42. 

I.  Weight  Distribution  of  Coset  Leaders 

A  minimum  Hamming  weight  vector  in  a  coset  of  an  ( N ,  K) 
binary  linear  block  code  C  is  called  the  coset  leader.  Let  o, 
denote  the  number  of  coset  leaders  of  C  whose  weight  is  i 
with  0  <  i  <  N.  Then,  YliL0ai  =  2N~K.  The  sequence 
(oo,  qi,  . . . ,  q/v )  is  called  the  weight  distribution  of  the  coset 
leaders  (WDCL)  of  C.  The  knowledge  on  WDCL  is  important 
for  the  evaluation  of  error  performance  of  the  code. 

All  the  weight  distributions  of  the  cosets  of  some  well- 
structured  codes  have  been  computed.  Such  codes  include 
the  r-the  order  Reed-Muller  code  of  length  2m  with  r  —  1  and 
m  <  5  and  with  r  =  2  and  m  =  5,  the  two-error-correcting 
(binary  primitive)  BCH  codes  and  their  extended  codes  and 
some  three-error-correcting  extended  BCH  codes  (see  [1]  for 
example).  In  [2],  Wadayama  et  al.  has  proposed  a  trellis- 
based  algorithm  for  computing  WDCL.  Using  the  algorithm, 
they  have  computed  WDCL  of  Reed-Muller  codes,  BCH  codes, 
and  extended  BCH  codes  with  N  <  128  and  N  —  K  <  28. 

II.  Algorithm  for  Computing  WDCL 

The  algorithm  proposed  in  [2]  for  computing  WDCL  uses 
a  syndrome  trellis.  The  straight  forward  way  to  search  min¬ 
imum  weight  paths  is  applying  the  Viterbi  algorithm  to  the 
entire  syndrome  trellis.  This  computing  method  is  simple, 
but  the  space  complexity  is  0{2N~K).  The  algorithm  we  pro¬ 
pose  requires  smaller  space  complexity.  The  key  to  improve 
required  space  complexity  is  that  we  compute  WDCL  using 
two  smaller  syndrome  trellises. 

We  first  form  a  check  matrix  H  of  C  in  (rii,  n2) -normal 
form  as  show  in  Figure  1.  Let  Vm  be  the  set  of  m-tuples 
over  GF(2)-,  and  let  w(s,Hi),i  —  1,  2  be  the  weight  of  the 
coset  leader  u  6  Vn%  of  the  code  with  parity  check  matrix 
Hi  and  uHj  =  s  e  Vri+rf.  The  weights,  w(s,Hi)  with  ev¬ 
ery  s  g  Vri+r>,  are  computed  using  two  syndrome  trellises 
constructed  for  Hi  and  H2  with  total  space  complexity  of 
0(2max^ri 'r2l +r  ).  Then  the  weights  of  the  coset  leaders  of  C 
can  be  computed  using  the  following  theorem. 

Theorem  1.  For  si  6  VTl ,  S2  6  Vrf,  and  s'  £  Vri,  w(s  1  o  s'  o 
s2,H)  =  mmB"eVr,{w(sios"  ,Hi)  +  w((s'  +  s")os2,H2)}.  □ 


N  —  K  { 


Figure  1:  Check  matrix  in  (m,n2)-normal  form. 

Here  uov  denote  the  concatenation  of  vectors  u  and  v.  The 
additional  space  complexity  to  compute  the  above  formula 
is  only  O(N).  Therefore  the  total  space  complexity  of  our 
algorithm  is  0{N  +  2max{ri’r*>+r'). 

III.  Computational  Results 

Using  the  proposed  algorithm,  we  have  newly  computed 
WDCLs  for  the  (64,22)  and  (128,99)  Reed-Muller  codes,  the 
(63,24),  (63,30),  and  (127,92)  binary  primitive  BCH  codes, 
and  the  (64,24),  (64,30),  (128,92),  and  (128,99)  extended 
binary  primitive  BCH  codes.  The  WDCLs  for  the  (64,  30) 
extended  binary  primitive  BCH  code  and  the  (128, 99)  Reed- 
Muller  code  are  shown  in  Table  1. 


Table  1:  WDCLs  for  the  EBCH(64,30) 
and  RM(  128,99)  codes. 


EBCH(64,30) 

RM(128,99) 

QO 

1 

1 

Ql 

64 

128 

02 

2,016 

8,128 

03 

41,664 

341,376 

04 

635,376 

5,293,995 

OS 

7,624,512 

42,330,624 

06 

74,974,368 

148,487,892 

Or 

607,475,136 

225,763,328 

08 

3,598,400,997 

114,645,440 

09 

7,749,340,480 

0 

OlO 

4,915,906,912 

0 

on 

225,452,736 

0 

N 
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Abstract  —  General  expressions  are  derived  for  the 
weight  distributions  of  the  binary  product  codes  Smi  ® 
<$m 2,  Umi  ®  l^m2)  fm,  0  «^m2  i  and  Srn\  0  Rm2  r 

where  Sm  is  the  [2m  —  l,m,2m_1]  simplex  code,  Hm  is 
the  [2”\m  +  l,2m_1]  first  order  Reed  Muller  code,  and 
£m  is  the  [ m,m  —  1,2]  even  weight  code. 


I.  Summary 

Previous  Work:  The  weight  distributions  of  most  families  of 
product  codes  are  unknown.  MacWilliams  and  Sloane  [1]  give 
the  weights  which  occur  in  the  product  of  simplex  codes.  Tol- 
huizen  et  al.  [2]  have  determined  the  number  of  codewords 
of  low  weight  for  arbitrary  linear  row  and  column  codes.  The 
weight  enumerator  of  the  dual  of  the  product  of  even  parity 
codes  is  also  known  [3], 

Preliminaries:  Consider  linear  [n,, ki,di]  codes  C,  for  t  =  1,2 
over  GF(q).  Their  direct  product  C\  ®C2  is  a  [niri2 ,  k\k7,  did2] 
linear  code  whose  codewords  A  form  anni  x«2  matrix  whose 
columns  are  codewords  of  C\  and  whose  rows  are  codewords  of 
Ci-  A  contains  a  k\  xk2  submatrix  M  of  information  symbols. 
Define  as  information  columns,  those  columns  of  A  coinciding 
with  columns  of  M;  define  remaining  columns  as  parity  check 
columns.  The  parity  check  columns  of  C\  0)  C2  are  defined  by 
the  same  linear  combinations  of  information  columns  as  the 
parity  check  bits  of  C2  are  linear  combinations  of  the  informa¬ 
tion  bits.  Clearly,  rank  A  =  rank  M. 


Main  Results:  The  weight  distributions  of  five  families  of  prod¬ 
uct  code  are  given  in  Theorems  1 — 5.  Outline  proofs  are  given 
only  for  Theorems  1 — 2. 


Theorem  1  In  the  product  Smi  0  Sm7  0}  simplex  codes, 
A,  =  0  unless  i  =  p(r)  =  2mi+m2_1  -  2mi+m2-1-r  for 
r  =  0, 1, ... ,  min{mi ,  m2}.  Then  Ao  =  1,  and 


r  —  l 

Ap(r)  =  JJ 

i=0 


(2mi  -  2')(2m2  -  2‘) 

2r  -  2' 


for  r  =  1,2,...,  min{mi ,  m2}. 


Outline  of  proof.  If  rank  M  =  r,  the  21”2  —  1  columns  of  A  com¬ 
prise  all  2r  —  1  non-zero  codewords  of  an  r-dimensional  sub¬ 
code  of  Smj  repeated  2m2_r  times,  and  additionally,  2m2-r  — 1 
zero  columns.  A ^r\  equals  the  number  of  distinct  m  1  x  m2 
matrices  of  rank  r. 


Outline  of  proof.  If  rank  M  —  r,  the  2m2  columns  of  A  com¬ 
prise  all  codewords  of  either  (i)  a  coset  of  x  £  Kro,  on  an 
r  —  1-dimensional  subcode  of  TZmi  repeated  2m2-r+1  times,  or 
(ii)  an  r-dimensional  subcode  of  7 2mi  repeated  2m2~r  times. 
A^(r)  equals  the  number  of  distinct  M  corresponding  to  (ii) 
in  which  the  subcode  does  not  contain  the  codeword  of  weight 
2m>. 


Theorem  3  In  the  product  Smi  0  Tim 2  of  a  simplex  code  and 
a  first  order  Reed  Muller  code,  A,  —  0  unless  i  =  p(r)  = 
2mi  +  m2_1  _  2m^m7-i-r  ^  f  _  o,  1, . . . ,  min{m, ,  m2}  or 
i  =  2mi+m2_1 .  Then  A0  =  1,  and 


i=0 


2')(2 


m2  +  l 


2'41) 


for  r  =  1,2,...,  min{mj ,  m2}.  There  is  no  explicit  expression 
for  A2 mi+m2_i  which  is  determined  by  ]T]  A,  —  2mUm2  +  1) , 


Theorem  4  In  the  product  0<Sm2  of  an  even  weight  and 
a  simplex  code,  A,  =  0  unless  i  =  p(r)  =  r2m2_I  for  r  = 
0,  2, 3, . . . ,  mi .  Then  Ao  =  1,  and 

A,(r)  =  2-"2  {(2m2  -  l)r  +  (-l)r(2m2  -  1)} 


for  r  =  2, 3, ... ,  mi . 


Theorem  5  The  product  0  Hm2  of  an  even  weight  and 
first  order  Reed  Muller  code  has  non-zero  codewords  of  weight 

p(i)  =  (mi  +  i)2m2_1,  0  <  |t|  <  mi  —  2 


and,  iff  m i  is  even,  a  single  codeword  of  weight  m i2™2.  The 
number  of  codewords  of  weight  p(i)  is 


where 

_  (2m2  -l)r  +  (-l)r(2m2  -1) 
r  2m2  +  l“ 'r 

for  r  =  2,3, . . . ,  mi,  and  Hi  —  (  m7+. )  i/mi+i  =  0  (mod  4), 

2 

and  is  0  otherwise. 


Theorem  2  In  the  product  Hmi  0  Hm2  of  first  order  Reed 
Muller  codes,  A;  =  0  unless  i  =  p(r)  =  2mi+m2_l  ± 

2m1+m2-l-r  f()r  r  _  or  i  _  2m1+m2-l_  Then 

A.q  —  A.-,m^^-m7  —  1,  and 


A  I*  (r)  —  JJ 
i=0 


(2 


mi  4-1 


<4-1  ^2m2  4  ^ 


>•+'1 


2r  -  2* 


for  r  =  1,2, .. .  ,min{mi,m2}.  There  is  no  explicit  expression 
for  A2ml+m2- 1  which  is  determined  by  £}A,  =  2^mi  +  lhm2  +  1U 
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Abstract  —  In  this  paper,  we  present  a  general  for¬ 
mula  for  the  asymptotic  largest  minimum  distance 
(in  blocklength)  of  deterministic  block  codes  under 
generalized  distance  functions  (not  necessarily  additive, 
symmetric  and  bounded).  An  alternative  expression 
for  the  same  formula,  which  is  useful  in  character¬ 
izing  the  general  Varshamov-Gilbert  bound,  is  next 
addressed. 

I.  Introduction 

The  problem  of  determining  the  asymptotic  largest  mini¬ 
mum  distance  among  block  codes  can  be  described  as  follows. 
Determine  the  asymptotic  ratio  (in  n)  of  n-fold  largest  min¬ 
imum  distance  among  M  selected  codewords  divided  by  the 
code  blocklength  n,  subject  to  a  fixed  rate  P=log(Af)/n  over 
a  given  code  alphabet  and  a  given  measurable  function  on  the 
‘distance’  between  two  code  symbols. 

Research  on  this  problem  has  been  done  for  years.  Up 
to  the  present,  only  bounds  on  this  ratio  are  established. 
Recently,  channels  without  statistical  assumptions  such  as 
memoryless,  information  stability,  stationarity,  causality,  and 
ergodicity,  etc.,  are  successfully  handled  by  employing  the 
notions  of  liminf  in  probability  and  limsup  in  probability  of 
the  information  spectrum  [2].  Inspired  by  such  probabilis¬ 
tic  methodology,  together  with  random-coding  scheme  with 
expurgation,  a  spectrum  formula  on  the  largest  minimum  dis¬ 
tance  of  deterministic  block  codes  for  generalized  distance 
functions  (not  necessarily  additive,  symmetric  and  bounded) 
is  established  [lj.  An  alternative  expression  for  the  same  for¬ 
mula  in  term  of  large  deviations  is  next  addressed. 

II.  Formulas  for  the  asymptotic  largest 

MINIMUM  DISTANCE  OF  BLOCK  CODES 

Denote  the  n-tuple  code  alphabet  by  Xn .  For  any  two 
elements  xn  and  xn  in  Xn,  we  use  pn(xn,xn)  to  denote  the 
n-fold  measure  on  the  “distance”  of  these  two  elements.  A 
codebook  with  block  length  n  and  size  M  is  represented  by 

C  „  *  c(n)  c{n)  1 

Cn,M  —  ,CX  ,C2 

where  ^  =  (cmi ,  Cm2,  ■  •  • ,  Cmn)j  and  each  Cmk  belongs  to  X . 
We  define  the  minimum  distance  and  the  largest  minimum 
distance  respectively  as 


dm(^n,M)  —  min  fin 
0<m<  M  —  1 


(<*>,<£>) 


dn,M  —  max  min  dmif^n.'M'j. 

Cn,M  0<m<M-\ 

Note  that  there  is  no  assumption  on  the  code  alphabet  X  and 
the  sequence  of  the  functions  {pn(-,  )}n>i  •  For  simplicity,  Xn 
and  Xn  are  used  specifically  to  denote  two  independent  ran¬ 
dom  variables  having  common  distribution  Px™  throughout. 
The  natural  logarithm  is  employed  unless  otherwise  stated. 

Theorem  1  (distance-spectrum  formula) 

sup  Ax(jR)  >  limsup  >  sup  A x(P  +  6) 

X  n-+oo  n  X 


sup  Ax(f2)  >  liminf  ^n,M  >  sup  AX(P  +  (5) 

X  n—¥oo  n  x 

for  every  S  >  0,  where 

Ax(R)=inf  | a  6  Sft  :  lim sup ^Pr|  ■ >  =0 


Ax (jR)  =  inf  |a  £  Si  :  liminf  ^Pr | =0 


'This  work  was  supported  by  National  Science  Council,  Taiwan, 
R.O.C.,  from  NSC  87-2213-E-009-139-  and  by  Univ.  of  Maryland, 
College  Park,  MD,  U.S.A. 


We  next  derive  an  alternative  expression  for  the  formulas 
derived  above. 

Lemma  1  (large  deviation  formulas  for  A x(P)  and 

£x(*))  _  ,  _  , 

A x(P)  =  inf  {a  €  5i  :  fx(a)  <  P} 

and 

A X(P)  =  inf  {a  £  3i  :  fx(a)  <  P} 
where  tx(a)  and  fx(a)  are  respectively  the  sup-  and  the  inf- 
large  deviation  spectrums  of  (l/n)fzn(Xn ,  Xn),  defined  as 

£x(a)=  limsup  —  —  log Pr  {  —fi„(Xn,  Xn)  <  a} 

n.— vno  71  y.  Tl  J 


£x(a)=liminf  —  —  log  Pr  \—/in(Xn,Xn)  <  a} 

n—¥ oo  71  l  71  ) 
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Abstract  —  An  adaptive  error  protection  scheme 
for  embedded  source  coders  is  presented.  Convolu¬ 
tional  codes  of  very  high  memory  and  regressive  re¬ 
dundancy  are  applied  to  encode  the  data  frame  of  the 
source.  The  channel  decoder  with  scalable  complex¬ 
ity  and  delay  employs  modified  sequential  decoding. 
In  contrast  to  conventional  coding  systems,  the  prin¬ 
cipal  idea  of  this  new  algorithm,  the  ’Far  End  Error 
Decoder  (FEED)’  is  to  delay  the  first  error  as  far  to 
the  end  of  the  frame  as  possible  rather  than  to  aim  at 
a  low  average  error  rate.  The  proposed  system  is  par¬ 
ticularly  appropriate  for  transmitting  over  strongly 
varying  channels. 

I.  Introduction 

In  todays  heterogeneous  network  and  application  world  com¬ 
munications  takes  place  between  users  with  a  wide  range  of  dif¬ 
ferent  bandwidth  resources,  computational  capabilities,  per¬ 
formance  requirements  and  transmission  conditions.  This  has 
initiated  a  growing  interest  in  multiresolution  or  progressive 
transmission  source  coding.  Multiresolution  source  coders  are 
data  compression  algorithms  in  which  simple  low-rate  source 
descriptions  are  embedded  in  more  complex  high-rate  descrip¬ 
tions.  Therefore,  we  refer  to  these  coders  as  embedded  source 
coders.  While  for  multiresolution  coders  [1]  the  refinement 
steps  are  large,  we  usually  have  very  small  step  sizes  in  case  of 
progressive  coding.  In  principle  each  bit  could  refine  the  in¬ 
formation,  i.e. ,  lower  the  distortion.  Theoretical  analysis  [1,  2] 
and  practical  applications  in  image  coding  [3]  show  that  the 
loss  due  to  progressive  transmission  is  almost  negligible,  con¬ 
sidering  the  advantages  provided  by  progressive  source  coders. 

However,  the  performance  of  these  coders  in  error  prone 
environments  like,  e  g.,  mobile  channels  degrades  significantly 
as  only  a  single  error  in  the  low-rate  description  usually  results 
in  a  useless  high-rate  description.  To  avoid  the  complete  loss 
of  the  low-rate  description  unequal  error  protection  (UEP) 
schemes  are  applied  to  embedded  source  coders  [5].  Addition¬ 
ally,  using  data  after  the  first  (undetected)  channel  decoding 
error  for  source  reconstruction  in  general  does  not  improve, 
but  might  even  decrease  the  quality.  Therefore,  localizing  the 
first  error  is  essential.  This  requires  an  outer  error  detection 
code  if  the  inner  code  is  a  convolutional  code.  To  avoid  a 
huge  overhead  due  to  this  error  detection  coding,  the  source 
data  has  to  be  blocked  in  relatively  large  units.  Hence,  stan¬ 
dard  coding  schemes  are  not  appropriate  for  embedded  source 
coders.  We  will  present  an  error  protection  scheme  adapted 
to  embedded  source  coders. 

II.  Far  End  Error  Decoding 

Following  the  discussion  above,  the  progressively  coded  source 
does  not  require  a  certain  frame  or  bit  error  rate  but  rather  an 


error-free  part  from  the  beginning  of  the  frame  as  far  to  the 
end  as  possible.  The  key  feature  of  our  channel  coding  system 
therefore  is  the  Far  End  Error  Decoder  (FEED)  which  delays 
the  first  decoding  error  as  far  out  as  possible.  To  achieve  this 
we  employ 

•  very  high  memory  systematic  convolutional  codes, 

•  a  regressive  redundancy  profile  by  puncturing  (results 
from  channel  and  source  optimized  UEP  and  rate  allo¬ 
cation  based  on  cutoff  rate  consideration), 

•  a  modified  sequential  decoding  algorithm  with  a  certain 
computational  resource  per  frame  [4],  and 

•  determination  of  the  virtual  error  free  part. 

For  more  details  see  [4].  The  principle  of  our  new  method  is 
not  to  deliver  the  whole  data  block  to  the  source  decoder  but 
to  deliver  only  the  error  free  part  from  the  beginning  of  the 
data  block  up  to  the  first  bit  error.  This  is  quite  natural  as 
the  interpretation  of  the  later  bits  by  the  source  decoder  after 
an  error  occured  is  wrong  anyway  as  outlined  before. 

III.  Conclusions 

The  decoding  method  is  self-adaptive  to  varying  and  unknown 
channel  conditions  (interference,  fading,  packet  loss)  and  pro¬ 
vides  graceful  degradation.  The  presented  source  and  channel 
coding  system  is  appropriate  for  compound  channels  where 
the  transmitter  is  usually  not  aware  of  the  transmission  con¬ 
ditions  (Internet,  mobile  channels)  or  it  has  to  transmit  to 
several  or  many  users  with  different  receiving  conditions  at 
the  same  time  (broadcast).  Additionally,  our  new  method 
provides  a  trade-off  between  complexity  and  quality  due  to 
the  sequential  decoding  unit.  Therefore,  we  conclude  that  the 
far  end  error  decoding  (FEED)  algorithm  is  a  well  adapted 
error  protection  method  for  the  huge  amount  of  existing  and 
upcoming  progressive  source  coding  algorithms. 
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Abstract  —  The  deficient  probability  is  the  proba¬ 
bility  that  a  sequential  decoder  fails  to  decode  the  re¬ 
ceived  data.  For  Jelinek’s  sequential  decoder  model, 
we  give  an  asymptotically  tight  bound  for  it  here. 

I.  Introduction 

Jelinek  introduced  the  term  “deficient  decoding”  to  refer  to 
decoding  failure  caused  either  by  decoding  error,  by  buffer 
overflow,  or  by  other  impairment  [1],  It  is  well  known  [2] 
that  the  number  of  computational  efforts  obeys  the  Pareto 
distribution  for  DMC’s  and  that  it  is  this  distribution  that 
hinders  proper  decoding  at  high  rates.  It  is  believed  that  the 
probability  of  deficient  decoding  decreases  proportionally  to 
iV1~'>  as  a  function  of  the  decoder  buffer  size  N  whenever  the 
rate  satisfies  E0(p)/p  =  R.  Simulation  results  confirm  this 
statement  but  there  has  been  no  rigorous  proof  for  it. 

II.  Convolutional  code  and  the  Fano  algorithm 

Consider  a  DMC  with  input  and  output  alphabets  A  and  B 
respectively  and  suppose  that  message  sequences  u  =  UiU2-  ■  . 
over  GF(q)  appear  with  equal  probability.  For  each  the 
encoder  of  rate  R  =  J  logq  first  calculates  s,,iSi,2.  ■  • Si by 

K-l 

Si'j  =  ^  ^  Ui  —  kQk+l,j  I  j =  1,  2, .  .  .  ,  l>,  (1) 

k=  0 

adds  a  bias  sequence  over  GF(q)  as 


When  gij  and  Vij  are  selected  randomly  and  independently 
of  each  other  from  GF(g),  then  (l)-(3)  define  a  random  ensem¬ 
ble  of  convolutional  codes  with  the  symbol  probability  r(a)  = 
\F(a)\/q.  For  this  input  probability  assignment  r,  we  let  P(b) 

=  Ea6Ar(a)P(fcla)- 

In  the  q- ary  code  tree,  a  message  iV  =  U\U2- .  .ut  uniquely 
specifies  a  path  of  length  i  and,  hence,  a  node  at  level  £. 

For  a  channel  output  sequence  ye  —  yij/2-  •  .yt,  we  assign 
each  node  iT  with  its  weight 


(u<)  =  log 

1  =  1  * 


p(yj\xi) 

P(Vi) 


where  =  x\X2...xi  is  the  codeword  for  u*  and  P(yi)  — 
flj=1  P(l/i,j)-  The  Fano  algorithm  searches  the  code  tree  for 
the  path  with  the  largest  weight. 

We  assume  the  decoder  model  shown  in  Fig.  1,  where  the 
search  unit  can  retain  a  code  tree  of  maximally  N  branches  in 
height,  and  the  control  unit  controls  the  process  of  sequential 
decoding  of  search  depth  S. 

III.  Probability  of  deficient  decoding 

We  show  the  following  result. 

Theorem  1  Suppose  that  pR  =  E0(p)  for  p  >  1.  Then,  for 
an  arbitrarily  given  e  >  0,  the  best  attainable  deficient  proba¬ 
bility  Pq  satisfies  inf  Pg  «  1  for  sufficiently  large  a ,  S, 

and  N,  where  S  is  a  positive  number  dependent  on  p. 


zi,j  —  si,j  T  vi,ii  3  —  1,  2, .  •  • ,  v,  (2) 

and  generates  channel  input  aq, 1X1,2. .  according  to  the 
transformation 

z  —)■  x,  whenever  js  E  F(x)  for  x  E  A,  (3) 

where  {F(a),  a  €  A}  is  a  partition  of  GF(g)  into  |A|  subsets. 
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Figure  1:  Sequential  decoder  model 
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I.  Introduction 

Recently,  sequential  decoding  of  large  memory  codes  has  been 
investigated^]  for  8-PSK,  16-QAM,  and  larger  constellations. 
The  results  show  that  high  reliability  can  be  achieved  at  chan¬ 
nel  signal-to-noise  ratios  (SNR’s)  where  the  code  rate  is  nearly 
equal  to  Ro,  yielding  1-1. 5dB  coding  gain  over  Viterbi  decod¬ 
ing  of  small  memory  codes.  Since  Bootstrap  Hybrid  Decoding 
(BHD) [2]  is  known  to  improve  the  cut-off  rate  of  a  sequential 
decoder  for  binary  modulation,  further  improvements  can  be 
expected  using  this  method  applied  to  TCM. 

In  this  paper  we  present  a  lower  bound  to  the  computa¬ 
tional  cut-off  rate  for  the  extension  of  the  BHD  scheme  to 
TCM.  Our  analysis  is  based  on  the  original  derivation  for  the 
case  of  binary  modulation  contained  in  [2].  Numerical  evalua¬ 
tions  of  the  expressions  obtained  for  the  cases  of  hard-decision 
8-PSK  and  8-level  quantized  4AM  modulation  show  significant 
improvements  over  the  usual  cut-off  rate,  similar  to  the  results 
obtained  in  [2]  for  BPSK  modulation.  These  results  suggest 
that  performance  close  to  capacity  can  be  achieved  with  suf¬ 
ficiently  powerful  TCM  systems  and  bootstrap  sequential  de¬ 
coding. 

II.  Bootstrap  Hybrid  Decoding  and  TCM 

Bootstrap  Hybrid  Decoding  can  be  described  as  follows. 
The  encoder  takes  a  set  of  m  —  1  information  sequences  and 
computes  their  sum.  It  then  encodes  all  m  sequences  using 
a  TCM  encoder  and  sends  them  over  a  noisy  channel.  Upon 
receiving  all  m  sequences,  the  sequential  decoder  attempts  to 
decode  each  sequence  using  a  modified  likelihood  function  that 
exploits  the  parity  constraint  introduced  by  the  encoder.  If 
it  succeeds  in  decoding  a  sequence,  it  assumes  the  estimate  is 
correct  and  subtracts  it  from  the  parity  constraint.  It  then 
updates  the  likelihood  function  based  on  the  new  parity  con¬ 
straint  and  proceeds  to  decode  the  remainder  of  the  unde¬ 
coded  sequences,  until  only  one  sequence  remains.  It  follows 
that  the  decoding  of  each  sequence  helps  in  decoding  the  re¬ 
maining  ones,  resulting  in  a  bootstrapping  effect. 

III.  A  Lower  Bound  on  Ro 

Consider  a  quantized  AWGN  channel  with  input  alphabet 
J,  output  alphabet  7,  and  a  set  of  transition  probabilities 
{p{y\x)\y  6  J,x  e  /}.  Let  Eoo(cr)  be  the  Gallager  function 
for  this  channel, 

£oc(ct)  =  -  log  ^[^p(3/k)T+^r  •  P(z)]1+a-  (!) 

y£J  x£/ 

Let  R  be  the  code  rate  and  (Too  be  defined  by  R  — 
Eoo  (o"oo ) /cr oo  *  For  the  BHD  scheme,  we  have  to  take  into  ac¬ 
count  the  effect  of  the  parity  constraint,  and  thus  the  Gallager 

‘This  work  was  supported  in  part  by  NASA  Grant  NAG5-8355, 
NSF  Grant  NCR95-22939,  and  CNPq  Grant  200617/94-0. 


BHD  Lower  Bound 


8-PSK  Htn!  Decision 


Fig.  1:  Lower  Bound  on  the  cut-off  rate  for  the  BHD  scheme  with 
hard-decision  8-PSK. 

function  is  written  as 

Ek{a)  =  -  log  ^  (^p(Y|x)T+^ -plx)]1^,  (2) 

y eJm  ie/ 

where  Jm  =  J  x  J  x  ■  ■  x  J  (m  times).  Let  ak  be  defined 
by  R  =  Ek{crk)/crk •  Let  Icr  be  the  value  of  k  that  makes  the 
quantities  kdoo  and  cr*,  the  closest. 

The  main  theorem  can  now  be  stated  as  follows: 

Theorem  1  Let  R  be  the  code  rate  in  each  of  the  m  streams 
in  the  BHD  scheme.  Let  C  be  the  number  of  computations 
per  decoded  bit  performed  by  the  BHD  decoder.  Then,  the  l-th 
moment  of  C ,  E[Cl],  is  finite  as  long  as 

min{<7fcn,  ( kR  +  1)<t  OO  }  ^ 

where  (Too,  eric,  and  kR  were  defined  above. 

IV.  Conclusions 

A  lower  bound  to  the  computational  cut-off  rate  for  the  BHD 
scheme  applied  to  TCM  was  presented.  Its  numerical  evalu¬ 
ation  for  hard-decision  8-PSK  modulation  (see  figure  1)  and 
8-level  quantized  4AM  modulation  shows  that  performance 
close  to  capacity  is  possible,  and  simulations  with  unquan¬ 
tized  channel  outputs  show  performance  within  0.5dB  of  turbo 
TCM  schemes,  at  a  lower  computational  complexity.  Due 
to  its  very  low  undetected  frame  error  probability,  the  BHD 
scheme  is  very  attractive  in  applications  where  reliability  is  of 
prime  importance. 
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Abstract  —  In  this  paper,  the  generalized  proof  are 
shown  for  the  coding  theorem  of  [1].  Consequently, 
the  further  discussion  is  obtained. 

I.  Introduction 

Despite  list  decoder  is  seldom  employed  for  applications,  Gen¬ 
eralized  Viterbi  Algorithm(GVA)  use  list  decoder  for  its  mean 
process  [2].  In  the  analysis  of  the  GVA,  the  error  probability 
is  dominated  by  that  of  list  decoder.  Therefore,  we  proposed 
the  decision  feedback  scheme  with  GVA,  which  has  the  con¬ 
structive  decision  rule  for  list  decoding  with  feedback  [1], 

In  this  paper,  the  more  strict  bound  and  a  certain  gen¬ 
eralization  of  the  proof  of  [1]  are  shown.  As  a  result,  the 
further  properties  can  be  clarified.  Throughout  this  paper, 
DMC  characterized  by  P  =  {Pij,  j  G  A,  i  €  B},  and  noiseless 
feedback  is  assumed. 

II.  The  algorithm  [1] 

We  are  concerned  with  a  q-ary  tree  code,  each  branch  of  which 
is  assigned  with  v  input  alphabet.  So,  the  rate  of  the  code 
is  defined  by  R  =  ^]nq  .A  N  information  sequence  of  q- 
ary  alphabet  specifies  a  path,  denoted  by  vnN .  Let  the  sub¬ 
sequence  of  path  u iN  from  the  root  to  the  n-th  level  be  iii". 
Furthermore,  L(branches)  is  defined  as  decoding  constraint 
length  of  the  GVA. 

{  Initial  condition  and  Recursive  procedure  } 

(Step  1  )InitiaI  condition  :  At  the  level  n  —  1  ,  each  state  of 
qL-1  Las  their  list,  namely  S  survivors.  Each  survivor  of  the  fist  is 
labeled  “Accept”. 

For  the  level  n  (L  <  n  <  N),  repeat  (Step  2)~(Step  4),  recur¬ 
sively. 

(Step  2 )Path  extension:  At  the  level  n,  all  retained  paths  are 
extended  by  one  branch  as  u”  =  u"-1u.  Then  each  metric  of  SqL 
paths  is  calculated. 

(Step  3 )Path  selection:  At  each  state  of  qL_1 ,  the  best  S  paths 
are  selected  among  the  qS  paths  as  the  list. 

(Step  4)  Testing:  We  denote  the  selected  list  as  C,  C  — 
{u”jj  ,  u”2j,  •  -  - ,  u"sj},  and  u”s+1  ^  is  the  S  -|-  1-th  most  path  at  the 
state  1 .  The  listed  paths  are  labeled  “Accept”  or  “Reject”  by  the 
following  decision  rule.  However,  a  path  once  labeled  “Reject”  is 
kept  its  label  “Reject”.  The  rule  is  if  j-  ^  ^  <  A,  A  >  1 

holds,  u^j,i=  1,2,  —  ,  S  are  labeled  “Reject”.  Otherwise,  u”  ^ ,  i  =■ 
1, 2,  -  •  • ,  S  are  labeled  “Accept”.  If  there  is  no  survivor  labeled  “Ac¬ 
cept”  at  any  qL_1  state,  the  retransmission  is  required  and  restart 
from  Step  1. 

{  Final  path  selection  at  the  check  tail  } 

By  L  —  1  known  symbols,  </''  1  lists  are  reduced  to  one  fist  with 

'For  the  received  sub-sequence  y™  from  root  to  level  n  ,  we 
denote  the  likelihood  of  the  fc-th  most  path  as  l'r(yrn  ). 


Step  2  ~Step  4.  Then,  by  T— (L  —  1)  known  symbols,  the  best  path 
is  selected  among  the  S  survivors  of  the  final  list.  If  the  label  of 
the  best  path  is  “Reject” ,  the  retransmission  is  required  and  restart 
from  Step  1. 


III.  Main  results 

Though  the  analysis  in  [1]  depends  on  each  tree  configura¬ 
tion  [5],  the  bounds  newly  obtained  are  independent.  So,  we 
newly  observe  the  case  that  S  is  very  large.  For  obtaining  the 
feedback  exponent  —  — -  In  Pr{E‘i)  of  Forney  [4],  we  take  A  as 
— —lnPr(Ei)  — >  0  (L  -4  cxi)  ,  where  Pr{E\)  and  Pr(E2) 
is  Pr  [The  decoding  error  occurs,  or,  the  retransmission  is  re¬ 
quired.]  and  Pr  [The  decoding  error  occurs.],  respectively.  We 
show  this  result  sis  Theorem. 

[Theorem]  As  S  — >  oo,  the  exponent  approaches  to  ej  (fl), 
e[°°\R)=  max  |so(l,a,/3,q)  +  a  ■  ey^R)!, 

X>4  =  {a>0,  p  >  0,  ec  —  E„(l,  a,  /3,q)  —  /3R  >  0} . 


Eo(R,  iTx,  pXr  q) 


e(f)(R)  =  max  EoF(S,v,  q), 


®3  =  { EoF(S ,  v,  q)  -  vSR  >0,  v  >  0} 


r  pi/u  1' 

k€Bj€A  L2^4  J  J 


IV.  Conclusion 

We  show  the  properties  of  the  decision  feedback  scheme  using 

fixed  size  list  decoder,  especially  the  size  of  list  is  very  large. 

The  exponents  we  have  obtained  have  the  similar  properties 

to  those  of  the  GVA. 
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Abstract  —  We  compare  the  theoretical  and  em¬ 
pirical  performance  of  horizon-free  universal  portfo¬ 
lios  for  a  large  number  of  stock  pairs  using  real  stock 
market  data  in  two  scenarios:  with  and  without  side 
information,  and  with  and  without  short  selling. 


I.  Summary 

The  horizon-free  /[/.-weighted  universal  portfolio  is  a  sequen¬ 
tial  investment  algorithm  that  has  been  shown  to  perform  as 
well  as  the  best  constant  rebalanced  portfolio  to  first  order  in 
the  exponent  (cf.  [1]).  Additionally,  a  number  of  theoretical 
properties  of  the  universal  portfolio  have  been  derived.  We 
are  interested  in  the  performance  of  the  universal  portfolio  in 
the  actual  stock  market  and  how  this  performance  compares 
with  the  theory.  To  this  end,  we  determine  the  performance 
of  horizon-free  universal  portfolios  for  a  large  number  of  stock 
pairs  using  real  stock  market  data  in  two  scenarios:  with  and 
without  side  information,  and  with  and  without  short  selling. 

First,  we  observe  for  a  large  number  of  stock  pairs  that 
the  n-day  universal  portfolio  return  Sn  consistently  per¬ 
forms  near  the  best  achievable  constant  rebalanced  portfo¬ 
lio  return  S £,  and  a  factor  of  28  better  than  the  mini¬ 
max  lower  bound  of  14  S'*  established  in  [3],  where  V4  = 
[U  (n,  n  n,„) 2-nH(TM/,n’'",n"",n)]  thus  indicating  that  the 
market  is  not  maximally  hostile.  We  also  compute  the  ratio 
Sn/Sn  for  real  data  and  compare  it  to  the  theoretical  asymp- 

|  j  |  ( 1  /  2  ) 

tote  -  '  tt— — ttt?  ,  where  m  is  the  number  of  stocks  and 

(m—  l)!(27r/n)im  ~  1 

J„  is  the  sensitivity  matrix  (cf.  [1].). 

We  then  extend  the  universal  portfolio  by  using  side  in¬ 
formation  to  assign  days  to  certain  states,  and  utilize  state 
constant  rebalanced  portfolios  as  in  [2].  For  a  state  constant 
rebalanced  portfolio  the  trading  days  are  divided  up  into  sub¬ 
sequences  based  on  the  state  information.  A  constant  re¬ 
balanced  portfolio  is  then  used  independently  on  each  subse¬ 
quence  of  days.  One  example  of  side  information  y,  for  a  pair 
of  stocks  is  to  assign  each  day  i  to  one  of  two  states,  1  or  2, 
corresponding  to  the  stock  with  the  larger  windowed  moving 
average  of  price  relatives  for  the  last  k  days.  The  best  constant 
rebalanced  portfolio  b*  and  the  universal  portfolio  b,  are  based 
on  the  current  and  past  staite  information  y 1  and  past  price 
relatives  x1_1.  The  best  constant  rebalanced  portfolio  return 
5,^(xn|yn)  is  the  product  of  the  best  constant  rebalanced  port¬ 
folio  returns  for  the  subsequence  of  days  associated  with  each 
state: 


S*n(xn\yn)  = 


max 

6 


<  PJ  max  btXl 


i:y,  =  l 


i:y,=2 


'This  work  was  supported  in  part  by  NSF  grant  NCR-9628193. 
MURI  DA  AD  19-99- 1-0252  and  JSEP  grant  DAAG55-97-1-01 15. 


Similarly,  Sn  is  the  product  of  the  wealth  factors  associated 
with  the  independent  running  of  the  universal  portfolio  on 
the  subsequences  of  trading  days  associated  with  each  of  the 
states.  For  actual  stock  market  data  we  observe  that  this  sim¬ 
ple  nonanticipative  algorithm  achieves  factors  as  large  as  105 
for  some  stock  pairs  over  a  twenty  year  period.  When  the 
side  information  of  the  windowed  moving  average  is  used  for 
two  portfolios  without  short  selling,  the  lqg  optimal  portfolio 
6*  for  each  state  often  exhibits  a  “bang-bang”  effect,  where 
all  the  wealth  is  allocated  to  a  single  stock.  This  “bang-bang” 
effect  often  has  all  the  wealth  pouring  into  the  stock  which  has 
been  underperforming.  Additionally,  the  “bang-bang”  effect 
indicates  that  even  more  wealth  can  be  generated  by  selling 
one  stock  short  and  buying  more  of  the  other.  Consequently, 
we  analyze  the  effect  of  short  selling  and  the  tradeoffs  be¬ 
tween  return  and  amount  of  leverage.  We  also  compare  the 
performance  of  the  //-weighted  universal  portfolio  with  the 
exponentiated  gradient  universal  portfolio  as  in  [4],  Next,  we 
look  at  the  performance  of  the  universal  portfolio  with  and 
without  side  information,  and  with  and  without  short  selling 
for  a  portfolio  of  fifty  stocks. 

Finally,  we  explore  the  use  of  several  heuristic  methods  for 
increasing  the  rate  at  which  the  universal  portfolio  learns  the 
stock  market.  These  methods  include  several  ways  of  creating 
a  fake  market  associated  with  the  actual  market,  computing 
portfolios  in  the  fake  market,  and  mapping  portfolios  from  the 
fake  market  back  to  the  actual  market. 
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Abstract  —  An  adaptively  random  searching  algo¬ 
rithm  is  proposed  for  log-optimal  portfolio.  Under 
reasonable  conditions  the  convergence  of  this  algo¬ 
rithm  is  shown.  The  numerical  results  using  this  al¬ 
gorithm  for  real  data  from  Shanghai  Stock  Exchanges 
are  sastisfactory. 

I.  Introduction 

Suppose  one  wishes  to  invest  in  a-  stock  market  consist¬ 
ing  of  m  stocks.  We  denote  the  market  by  a  random  vector 
X  =  (XijXi,  ■  ■  • , Xm)T ,  where  X,  denotes  the  price  relative 
of  z-th  stock,  i  =  1,2,  X  is  supposed  to  be  drawn 

according  to  the  distribution  F(x),x  €  Rm.  A  portfolio 
b  =  (hi,  62,  •  •  • ,  bm)T,  bi  >  0,  hi  =  1,  is  an  allocation  of 
investment  capital.  The  expected  log  return  W(b)  is  defined 
by  VF(b)  =  F[ln(bTx)]  =  J  ln(bTx)dF(x).  The  problem  is 
to  find  the  optimal  portforlio  b*  to  reach  the  maximal  ex¬ 
pected  log  return  W *  which  is  defined  by  W*  =  maxb  W(b). 
A  systematic  discussion  can  be  found  in  Cover  and  Thomas’ 
book  [1,  Chapter  15].  Optimization  algorithms  abound  for 
this  problem.  Readers  can  refer  to  Cover[2],  Ye  and  Zhang[3] 
and  the  references  cited  therein  for  details.  In  this  paper  we 
suggest  an  adaptively  random  searching  algorithm  wiich  is  dif¬ 
ferent  from  the  above-mentioned  algorithms.  The  main  body 
of  the  result  is  derived  from  a  more  general  framework  of  con¬ 
strained  stochastic  gradient  method. 

II.  Main  Results 

The  purpose  is  to  solve  the  problem  of  finding  b*  to  achieve 

W*  =  maxW(b)  =  max  J  ln(b Tx)dF(x),  (1) 

based  upon  the  observed  data  x(l),  x(2),  •  •  • ,  x(f),  ■  •  • ,  where 
b(t)  is  the  observed  stock  return  vector  for  1-th  day. 

First  we  take  a  quadratic  parameter  transformation  bt  = 
wf  to  change  the  original  constraint  manifold,  a  (m  —  1)- 
dimensional  simplex,  B  =  (b  :  bi  >  0,  ]T\  F  =  1}  into  a 
(m  —  l)-dimensional  unit  surface, 

5m_1  =  {w  =  (wi,W2,  ■  ■  •  ,wm)  ■  ^2  wf  =  1}. 

i 

Then  the  problem  considered  becomes 

Maximized (w)  =  MaximizeF[/(w,  X)], 

where  /(w,x)  =  log^”^  wfxi). 

Next  we  propose  the  following  algorithm  to  solve  the  above 
problem: 

1This  work  was  supported  by  the  National  Natural  Science  Foun¬ 
dation  of  China  under  the  Grant  No.  79790130  and  19901018. 


Step  0  Given  any  initial  guess  €  Sm  1  ,t  =  0; 

Step  1  Compute  the  modified  quantities 

w‘*+«  =  („<'«>,  «,<•«>, ■  ..,»£«>)’■ 

by  the  formula  w^t+1^  =  +  Aw^,  where  Aw^‘*  = 

77(f)gradM/(w(£\x^),  t  =  0, 1,  ■  •  • ,  rj(t)  is  some  chosen 
positive  step  size,  called  learning  rate  in  general.  The 
gradient  vector  is  easy  to  compute  by 


gradM/(w' 


(t)  MU 


)  =  { 


Xi(t)w\ 


(0 


-.m 

Ji=l- 


Step  2  Halt  if  the  iteration  of  time  reaches  some  predeter¬ 
mined  number,  or  the  norm  of  the  difference  w -1 '  1 ;  — 
is  small  than  some  given  control  precision;  other¬ 
wise,  (t  +  1)  — >  t ,  go  to  step  1. 

The  learning  rate  sequence  {rj(t)}  satisfies  that 


T](t)  >  0,  77(f)  =  +00, 77(f)  0,  as  t  —>  +00. 

t 

It  is  shown  that  under  some  reasonable  conditions,  this  al¬ 
gorithm  converges  and  results  in  the  optimal  portfolio  b*  = 
■  ■  ■ ,  Wm)T  which  achieves 

W *  =  maxff(b). 

b 

The  numerical  results  based  on  this  algorithm  with  real  data 

from  Shanghai  Stock  Exchanges  are  satisfactory. 
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We  consider  N-layer  scalable  source  coding  of  a  finite  mem¬ 
oryless  source  X  ~  px.  Let  X,  denote  Xi, . . . ,  Xit  where  Xl 
is  the  reproduction  at  the  ith  layer.  From  [5] ,  we  know  that  a 
scalable  coder  can  achieve  the  sequence  of  decreasing  distor¬ 
tions  D  —  {£>i}£Li  and  increasing  rates  R  =  if  and 

only  if  there  exists  a  conditional  distribution  QXn\x  such  that 

E(d(X,Xi))  <  Di  i=l,...,N 

I(X;Xi)  <  Ri  i=l,...,N. 


Thus,  the  problem  is  that  of  double  minimization  and,  as  will 
be  shown,  is  solvable  by  alternating  minimization. 

Lemma  2: 

a)  Given  Q,  arginfq  Fa  ^(Q,q)  is  the  marginal 

9x„(Q)  =  ^2pxQ*n  i* 

X 

b)  Given  q,  arginfQ  is  given  by 


The  27V-dimensional  achievability  region  A  is  convex.  Hence, 
in  order  to  find  a  point  on  the  boundary  of  A  with  an  inward 
normal  vector  (a  =  {ai}^Lltp  -  {/?,}ili),  we  must  solve  the 
following  minimization  problem: 

N 

Fa  /3  =  inf  x0  +  PiE(d(X,  Xi))  . 

Q**'r  i=i 


O  (  )  =  q*N  exp^~  &dx’x '  +  Q*  log  /«,xj 

X'V '*  q  Ezm  <l*N  exp  {-  Et=i  +  a'i  log  /i,Zi } 


where  a'  =  a*/  £\=i  <*j  and  Pi  =  Pi/  and 

/*,*,  =  X^^-+il’'.exp{-A'+1dx1*i+I}(/*|x1i,zi+1)1_ai+1  , 


The  above  problem  was  first  addressed  by  EfFros  [4,  Sec¬ 
tion  V],  A  new  system  of  equations  and  inequalities  regarding 
the  optimal  marginal  qXN  was  formed,  and  all  tentative  solu¬ 
tions  (extracted  from  the  equations)  were  tried  until  the  one 
satisfying  the  optimality  conditions  (the  inequalities  in  the 
system)  was  found.  (See  [1,  Section  2.6]  for  the  details  of 
the  approach  for  the  ordinary  rate-distortion  problem.)  How¬ 
ever,  it  was  not  clear  how  gz.+1|x.  should  be  defined  when 
qx,  =0.  In  fact,  we  showed  that  satisfaction  of  the  condi¬ 
tions  given  in  [4]  for  some  assumed  qXi+1  |Xj  when  qXl  =  0, 
does  not  necessarily  imply  the  optimality  of  qXN.  Moreover, 
this  approach  becomes  impractical  as  the  size  of  the  output 
alphabet  grows.  (For  an  extreme  example,  consider  contin¬ 
uous  source  and  reproduction  alphabets.)  As  a  remedy,  we 
propose  an  iterative  algorithm  which  is  a  generalization  of  the 
Blahut-Arimoto  (BA)  algorithm  [2]  for  rate-distortion  com¬ 
putation.  The  algorithm  is  initialized  with  arbitrary  nonzero 
reproduction  probabilities,  and  monotonically  approaches  the 
optimal  reproduction  distribution.  We  also  revise  the  optimal¬ 
ity  conditions  to  handle  the  complications  that  arise  whenever 

<7x,;  =  0. 

Let  Q  =  {Qxw|a;}  and  q  =  {gxN}  denote,  in  vector  no¬ 
tation,  a  random  encoding,  and  a  reproduction  distribution, 
respectively. 

Lemma  1: 

Fa,(3  =infinfEa  /3(Q,q)  , 

where 

N 

Fa,/3(Q>q)  =  ^2@iEQ(d(X,Xi))  +  a-iD{QXi\xpx\\qx%px) 

i= 1 

1This  work  was  supported  in  part  by  the  NSF  under  grant  no. 
MIP-9707764,  the  UC  MICRO  program,  Conexant  Systems,  Inc., 
Fujitsu  Laboratories  of  America,  Inc.,  Lernout  k,  Hauspie  Speech 
Products,  Lucent  Technologies,  Inc.  and  Qualcomm,  Inc. 


for  i  =  1, . . . ,  N  —  1,  and  f”XN  =  1. 

Theorem  1:  Let  q(0)  be  positive  everywhere,  and  let 
Q(n)  =  Q(q1'1-1^),  and  q(n)  =  q(Q(n))  for  n  =  1,2,3,.... 
Then  the  sequence  q(0),  Q(1),  q(1),  Q(2),  . . . ,  converges  to 

(QW)  =  arginf(Fa/3(Q,q))  . 

The  proof  follows  the  same  line  as  the  proof  for  the  optimality 
of  BA,  given  in  [3] .  Finally,  the  optimality  conditions  are  given . 
by 

Theorem  2:  A  given  q  is  optimal  if  and  only  if  there  exists 
a  legitimate  qXy+l\*,  for  all  qx,  =  0,  so  that 

r’x  <  vXN  i  <  •  •  •  <  vXl  <  1  , 
for  all  Xat,  where 

v  =  Pxflxj  exp  {-  ZUi  +  ai  lQg  /lx, } 

*  <?* v  exp  {-  £f=1  +  a[  log  f'XyZ. }  ‘ 
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I.  Introduction 

The  Arimoto-Blahut  algorithm  ([1],  [2])  determines  the  ca¬ 
pacity  of  a  discrete  memoryless  channel  through  an  iterative 
process  in  which  the  input  probability  distribution  is  adapted 
at  each  iteration.  While  it  converges  towards  the  capacity- 
achieving  distribution  for  any  discrete  memoryless  channel, 
the  convergence  can  be  slow  when  the  channel  has  a  large 
input  alphabet.  This  is  unfortunate  when  only  a  small  num¬ 
ber  of  the  input  letters  are  assigned  non-zero  probabilities  in 
the  capacity-achieving  distribution.  If  we  knew  which  input 
letters  will  end  up  with  a  probability  of  zero,  we  could  elim¬ 
inate  these  letters  and  operate  the  algorithm  on  a  subset  of 
the  input  alphabet.  The  algorithm  would  converge  towards 
the  same  solution  faster. 

We  present  an  algorithm  which  makes  use  of  this  fact  to 
speed  up  the  convergence  of  the  Arimoto-Blahut  algorithm 
in  such  situations.  The  algorithm  starts  with  an  input  alpha¬ 
bet  consisting  of  two  symbols,  then  grows  the  alphabet  by  one 
symbol  at  every  iteration  until  it  includes  all  the  symbols  with 
non-zero  probabilities.  At  every  iteration,  the  Arimoto-Blahut 
algorithm  is  used  to  compute  a  capacity  relative  to  a  partial 
input  alphabet.  When  our  algorithm  terminates,  it  will  have 
found  the  same  solution  as  the  Arimoto-Blahut  algorithm  ap¬ 
plied  to  the  complete  input  alphabet.  However,  we  cannot 
guarantee  that  our  algorithm  will  include  only  symbols  with 
non-zero  probabilities  in  the  partial  alphabet  it  ends  up  with. 


II.  The  Algorithm 

Let  X  be  the  input  random  variable  to  a  discrete  memory¬ 
less  channel,  and  let  X  take  on  values  over  the  finite  alphabet 
A.  Let  Y  be  the  output  random  variable  of  the  channel  and 
let  C  be  its  capacity.  We  define 

I(X  =  x;  Y)  =  £  Py\x{y\x)  log 

as  in  [3].  Let  Cm  denote  the  capacity  of  the  discrete  memo¬ 
ryless  channel  induced  when  all  but  the  letters  in  the  subset 
A!  of  A  are  forcibly  assigned  a  probability  of  zero.  We  give 
an  outline  of  our  algorithm: 

1.  Determine  (x,y)  €  A2  which  maximizes  C{x,y}  over  all 
choices  of  x  and  y.  Define  A'  =  {x,  y}  and  C'  =  C^1  ■ 


2.  If  A'  =  A,  then  C  =  C'  and  the  algorithm  terminates. 
Otherwise,  for  all  x  €  A  \  A',  compute  I(X  —  x;  Y).  If 
the  values  computed  are  all  smellier  or  equal  to  C'\  then 
C  =  C'  by  [3,  Theorem  4.5.1]  and  the  ■algorithm  can  be 
terminated  at  this  point. 


3.  Add  the  symbol  x  that  maximized  I(X  =  x;  V)  in  step  2 
to  the  set  A'.  Recompute  C'  —  C#  using  the  Arimoto- 
Blahut  algorithm.  Return  to  step  2. 


The  algorithm  is  certain  to  obtain  the  correct  solution  for  the 
following  reasons: 


lrThis  paper  documents  work  performed  at  the  Signal  & 
Inf.  Proc.  Lab.,  ETH  Ziirich,  Switzerland. 


•  When  the  algorithm  exits  in  step  2  because  all  the  values 
of  I(X  =  x\Y)  are  smaller  or  equal  to  C' ,  the  solution 
is  guaranteed  to  be  correct  by  [3,  Theorem  4.5.1]. 

•  The  algorithm  must  eventually  exit.  In  the  worst  case, 
it  will  end  up  including  all  the  symbols  of  A  into  A'  ■ 
In  this  case,  the  last  occurrence  of  step  3  applies  the 
Arimoto-Blahut  algorithm  to  the  complete  input  alpha¬ 
bet.  This  will  be  the  case  when  our  iterated  algorithm 
is  applied  to  channels  whose  capacity-achieving  input 
distributions  have  only  non-zero  terms. 

As  already  mentioned,  there  is  no  guarantee  that  the  algo¬ 
rithm  will  only  include  symbols  whose  probabilities  in  the 
capacity-achieving  distribution  are  non-zero  into  its  partial 
alphabet  A' .  However,  the  practical  examples  for  which  the 
algorithm  was  tested  seem  to  indicate  that  it  is  highly  unlikely. 

III.  Practical  Implementation  and  Conclusion 
The  practical  need  for  such  an  algorithm  arose  in  an  at¬ 
tempt  to  compute  the  optimal  coding  distributions  for  uni¬ 
versal  lossless  source  coding  over  sets  of  discrete  memory¬ 
less  sources  with  monotone  non-increasing  probability  distrib¬ 
utions  with  a  fixed  average.  The  problem  of  determining  the 
optimal  coding  distribution  for  universal  coding  over  a  set  of 
probability  distributions  is  equivalent  to  the  problem  of  com¬ 
puting  the  capacity  of  a  discrete  memoryless  channel  [4]. 

For  an  alphabet  size  of  256,  the  problem  of  determining 
the  optimal  coding  distribution  for  the  set  of  monotone  non¬ 
increasing  distributions  with  a  fixed  average  corresponds  to 
the  computation  of  the  capacity  of  a  channel  with  an  input 
alphabet  of  up  to  16’000  letters.  Of  those  16’000  letters,  only 
about  100  letters  have  non-zero  probabilities  in  the  capacity- 
achieving  distribution.  Therefore,  the  algorithm  presented 
here  allowed  a  considerable  acceleration  of  the  convergence 
when  compared  to  a  conventional  implementation  of  the  Ari¬ 
moto-Blahut  algorithm.  A  detailed  presentation  of  this  appli¬ 
cation  along  with  more  information  on  the  implementation  of 
the  algorithm  are  given  in  [5]. 
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Abstract  —  The  theory  and  practice  of  digital  com¬ 
munication  during  the  past  50  years  has  been  strong¬ 
ly  influenced  by  Shannon’s  separation  theorem  [1]. 
While  it  is  conceptually  and  practically  appealing  to 
separate  source  from  channel  coding,  either  step  re¬ 
quires  infinite  delay  in  general  for  optimal  perfor¬ 
mance.  On  the  other  extreme  is  uncoded  transmis¬ 
sion,  which  has  no  delay  but  is  suboptimal  in  gener¬ 
al.  In  this  paper,  necessary  and  sufficient  conditions 
for  the  optimality  of  uncoded  transmission  are  shown. 
These  conditions  allow  the  construction  of  arbitrary 
examples  of  optimal  uncoded  transmission  (beyond 
the  well-known  Gaussian  example). 

I.  Previous  and  Basic  Results 

We  consider  a  discrete-time  memoryless  source  represented 
by  the  random  variable  S  €  S.  The  source  output  S  is  applied 
directly  to  a  memoryless  channel.2  The  channel  output  Y  € 
is  our  estimate  of  the  source  with  respect  to  a  distortion  mea¬ 
sure  d(s,y).  The  source  is  specified  by  a  probability  density 
(or  mass)  function  p(s)  and  a  distortion  measure  d(s,y).  The 
channel  is  specified  by  a  conditional  probability  density  (or 
mass)  function  W (y\s)  and  a  channel  input  cost  function  p(s). 
Therefore,  uncoded  transmission  achieves  (average)  distortion 
A  =  Ed(S,  Y)  and  (average)  input  cost  V  =  Ep(S). 

Definition.  Uncoded  transmission  of  the  source  (p,  d)  across 
the  channel  ( W ,  p)  is  optimal  if:  (i)  A  is  the  minimum  distor¬ 
tion  achievable  when  the  maximum  input  cost  is  T;  and  (ii)  T 
is  the  minimum  input  cost  to  achieve  distortion  at  most  A. 

Let  R(D)  be  the  rate-distortion  function  of  the  source,  and 
D(R)  the  distortion-rate  function.  Correspondingly,  let  C(P) 
be  the  capacity-cost  function  of  the  channel,  and  P(C )  the 
cost-capacity  function.  From  the  separation  theorem,  we  have 
the  following  Fact:  Uncoded  transmission  of  the  source  (p,d ) 
across  the  channel  (W,  p)  is  optimal  if  and  only  if  (i)  A  = 
D(C(T)),  and  (ii)  T  =  P(R( A)). 

These  two  conditions  are  cumbersome  to  work  with.  For 
most  cases  of  interest,  we  can  find  simpler  necessary  and  suf¬ 
ficient  conditions.  However,  let  us  first  exclude  certain  special 
cases.  Let  Co  denote  the  capacity  of  the  unconstrained  chan¬ 
nel  (W,p),  i.e.  Co  =  C(P  -4  oo). 

Condition  A.  The  source  (p,  d)  and  the  channel  (W,  p)  sat¬ 
isfy  condition  A  if  (i)  in  case  I(p,W)  =  0,  W  is  the  u- 
nique  achiever  of  zero  mutual  information,  and  (ii)  in  case 
7(p,  W)  =  Co,  p  is  the  unique  achiever  of  Co- 

The  condition  ensures  that  D(R(  ))  and  P(C(-))  are  the 
identity  functions,  respectively. 

Lemma  1.  Granted  condition  A,  uncoded  transmission  of 

1M.  Vetterli  is  also  with  the  Dept,  of  EECS,  UC  Berkeley,  USA. 

2For  the  framework  of  this  paper,  we  assume  that  the  channel 
input  alphabet  is  also  S.  The  extension  to  arbitrary  memoryless 
encoders  and  decoders  will  be  presented  at  a  later  stage. 


the  source  (p,  d)  across  the  channel  (W,p)  is  optimal  if  and 
only  if  R( A)  =  C(r). 

This  Lemma  follows  essentially  from  [2];  however,  Condi¬ 
tion  A  was  not  mentioned  there.  On  a  more  intuitive  level, 
Lemma  1  implies  the  following: 

Lemma  2.  Granted  condition  A,  uncoded  transmission  of 
the  source  (p,  d)  across  the  channel  (W,  p)  is  optimal  if  and 
only  if  (i)  the  source  p  achieves  capacity  on  the  channel  (W,  p) 
(at  input  cost  T),  and  (ii)  the  channel  W  achieves  the  rate- 
distortion  function  of  the  source  (p,  d)  (at  distortion  A). 

Unfortunately,  in  order  to  compute  rate-distortion  and 
capacity-cost  functions,  we  have  to  resort  to  numerical  meth¬ 
ods  in  general.  Thus,  neither  Lemma  1  nor  Lemma  2  give  an 
explicit  way  to  verify  whether  or  not  for  a  given  source  and 
channel  uncoded  transmission  is  optimal  and  to  construct  ex¬ 
amples  of  such  source/channel  pairs. 

II.  Main  Result 

Proposition.  Uncoded  transmission  of  the  source  (p,  d) 
across  the  channel  ( W,p )  for  which  0  <  I(p,W)  <  Co  is  opti¬ 
mal  if  and  only  if 

p(s)  =  CiD(W(-|s)||py)  +p0 

d(s,y)  =  -c2log2  +  d0(s), 

for  some  constants  ci  >  0,  c2  >  0,  po  and  an  arbitrary  func¬ 
tion  do(s),  where  D(-||-)  is  the  Kullback-Leibler  distance  and 
py(y)  =  ■EW(plS')  is  the  pdf  of  Y . 

A  proof  of  this  proposition  can  be  found  in  [3].  A  similar 
result  can  be  obtained  for  the  case  I(p,  W)  =  Co-  Note  that 
the  proposition  allows  to  construct  essentially  all  occurrences 
of  optimal  uncoded  transmission. 

Universality  of  uncoded  transmission.  The  most  interest¬ 
ing  applications  of  uncoded  transmission  are  cases  where  the 
separation  theorem  does  not  hold,  e.g.  non-ergodic  channels 
or  multi-user  communication.  Consider  a  broadcast  scenario 
with  one  source  and  many  (different)  channels.  If  it  turns  out 
that  the  above  proposition  is  satisfied  for  the  source  and  each 
channel  individually,  then  uncoded  (broadcast)  transmission 
is  (globally)  optimal.  In  this  example,  uncoded  transmission 
exhibits  a  property  of  universality,  whereas  the  performance 
of  any  separation-based  coding  scheme  is  strictly  suboptimal. 
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Abstract  —  Two  source-channel  coding  strategies, 
tandem  coding  and  channel-optimized  quantization, 
are  compared  on  the  basis  of  distortion  vs.  complex¬ 
ity.  For  each,  formulas  for  the  mean-squared  error 
and  complexity  used  to  minimize  distortion  subject 
to  a  complexity  constraint.  The  results  suggest  there 
is  a  threshold  such  that  tandem  coding  is  better  than 
channel-optimized  quantization  when  and  only  when 
the  complexity  constraint  is  larger  than  the  threshold. 

I.  Introduction  and  System  Description 

An  oft  claimed  motivation  for  joint  source-channel  cod¬ 
ing  (JSCC)  is  the  potential  to  obtain  good  performance  with 
less  delay  or  complexity  than  with  the  conventional  tandem 
source-channel  coding  (TSCC).  However,  little  quantitative 
support  for  this  claim  has  appeared  in  the  literature.  In  this 
work,  we  seek  to  determine  the  validity  of  the  claim  by  quanti¬ 
tatively  comparing  representative  systems  of  each  type  on  the 
basis  of  distortion  vs.  complexity.  (Delay  is  not  considered.) 

To  avoid  idiosyncrasies,  the  source,  channel  and  represen¬ 
tative  systems  are  chosen  to  be  as  plain  vanilla  as  possible. 
Specifically,  the  source  is  Gauss-Markov;  the  channel  is  binary 
symmetric  (BSC);  the  distortion  measure  is  mean-squared  er¬ 
ror  (MSE);  the  TSCC,  as  in  [1],  is  a  conventional  transform 
(source)  code  in  tandem  with  a  Reed-Solomon  (R-S)  channel 
code,  and  the  JSCC  is  a  channel-optimized  transform  code 
(COTC),  which  is  a  kind  of  channel-optimized  quantizer.  In 
both  cases,  a  KLT  transform  is  used,  and  the  transform  coeffi¬ 
cients  are  encoded  with  fixed-rate  scalar  quantizers.  (Entropy 
coding  is  not  used  because  of  its  idiosyncratic  sensitivity  to 
channel  errors.)  For  COTC,  the  quantizers  and  bit  allocations 
are  optimized  for  the  BSC  as  in  [2].  For  TSCC,  the  quantizers 
are  optimized  for  a  noiseless  channel,  with  conventional  bit  al¬ 
locations,  and  the  (n,  k,  m)  R-S  encoder  is  systematic.  When 
the  BSC  output  is  within  the  error  correcting  capability,  t, 
of  some  R-S  codeword,  the  R-S  decoder  produces  the  first  k 
symbols  of  that  codeword.  Otherwise,  the  received  decoder 
is  said  to  FAIL,  and  it  simply  produces  the  first  k  channel 
output  symbols. 

II.  Distortion,  Complexity  and  Optimization 

The  MSE  of  the  COTC  may  be  computed  as  in  [2],  That 
of  the  TSCC  can  be  computed  in  the  form 

E[D]  =  E'fjDlno  fail]  Pr(no  fail)  +  E[D|fail]  Pr(fail). 

When  t  >  4,  the  probability  of  decoding  error  given  no  failure 
is  negligible.  Thus,  R[D|no  fail]  is  the  usual  distortion  of  the 
transform  code  on  a  noiseless  channel.  The  computation  of 
Pr(fail)  is  elementary,  and  a  detailed  method  for  computing 
E’fDIfail]  has  been  developed. 

1Work  supported  by  ARO  MURI  grant  DAAH04-96-1-0337. 


As  a  measure  of  complexity  C,  we  sum  estimates  of  the 
numbers  of  arithmetic  operations  per  data  sample  used  in  en¬ 
coding  and  decoding.  For  COTC,  C  =  4L  —  2  +  R,  where  L  = 
transform  dimension  and  R  —  number  of  BSC  uses  per  data 
sample.  For  the  tandem  code,  C  =  4A  —  2  +  Rk/n  +  (a k  + 
/3n)(n  —  k)R/nm,  where  a  =  2  and  /?  =  10. 

Using  the  above  methods  for  computing  the  MSE  and  com¬ 
plexity,  an  optimization  program  searched  for  the  best  param¬ 
eters  for  systems  of  each  type,  subject  to  a  complexity  con¬ 
straint.  The  Gauss-Markov  correlation  coefficient  is  fixed  at 
0.9.  The  BSC  error  probability  p  and  the  number  of  BSC  uses 
per  data  sample  R  are  also  fixed  at  various  values.  The  TSSC 
parameters  are  L,  n,  m,  k;  the  only  COTC  parameter  is  L. 

III.  Results  and  Conclusions 

A  representative  result  of  the  optimizations  is  provided  by 
Figure  1.  For  small  values  of  complexity,  channel-optimized 
transform  coding  is  better.  However,  as  complexity  increases, 
its  performance  tends  to  saturate  and  tandem  coding  becomes 
better.  In  other  words,  there  appears  to  be  a  threshold  such 
that  tandem  coding  is  better  than  channel-optimized  trans¬ 
form  coding  when  and  only  when  the  available  complexity  is 
above  this  threshold.  Other  results,  not  shown,  indicate  that 
the  complexity  threshold  decreases  as  the  BSC  error  proba¬ 
bility  decreases. 
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Fig.  1:  Performance  (S SNR)  vs.  complexity  (op’s/sample). 
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Abstract  —  A  new  trellis  representation  for  vari¬ 
able  length  codes  (VLC)  is  proposed  which  allows 
soft-in/soft-out  decoding  of  these  codes.  Applying  the 
BCJR-algorithm  on  this  trellis  either  symbol-level  or 
bit-level  reliability  information  for  the  variable  length 
coded  sequence  can  be  obtained.  By  using  this  soft- 
in/soft-out  VLC  decoder  for  iterative  (“turbo”)  de¬ 
coding  of  a  serially  concatenated  scheme  consisting 
of  an  outer  variable  length  code  and  an  inner  con¬ 
volutional  code  separated  by  an  interleaver  signifi¬ 
cant  gains  can  be  yielded  compared  to  a  instanta¬ 
neously  decoded  variable  length  c6de  of  the  same 
overall  source  and  channel  code  rate. 

I.  Introduction 

Recently  several  schemes  have  been  proposed  to  perform 
decoding  of  variable  length  codes  by  considering  the  overall 
sequence  rather  than  decoding  the  VLC  coded  symbol  stream 
instantaneously  using  the  prefix  property  of  these  codes.  Some 
of  these  approaches  also  use  trellis  representations  of  vari¬ 
able  length  encoded  symbol  sequences  and  carry  out  either 
maximum  likelihood  (ML)-  or  maximum  a  posteriori  (MAP)- 
sequence  estimation  to  decode  the  source  symbols.  Although 
in  [2]  symbol-level  soft-output  was  proposed,  the  soft-output 
was  not  used  for  further  processing.  We  present  a  soft-in/soft- 
out  VLC  decoder  which  can  be  used  in  an  iterative  decoding 
scheme. 

II.  Trellis  Representation 

Consider  a  source  that  independently  produces  outputs  se¬ 
lected  from  an  M-ary  alphabet  U  —  {0,  ...,M  —  1}.  A  vec¬ 
tor  u  of  K  source  output  values  is  mapped  to  a  vector  c  of 
codewords  taken  from  a  variable  length  code  C  for  the  given 
symbol  alphabet.  Let  1  =  be  an  M-tuple  that  de¬ 

fines  the  lengths  of  the  codewords.  The  total  bit-length  of 
the  VLC  vector  c  is  denoted  by  N.  Every  sequence  consisting 
of  K  symbols  and  N  bits  can  be  graphically  represented  by 
a  trellis-like  structure  as  shown  in  Figure  1  for  K  =  4  and 
N  =  6,  where  the  horizontal  axis  represents  the  symbol  time 
and  the  vertical  axis  represents  the  bit  time.  The  alphabet  size 
in  the  example  is  M  =  3  and  the  lengths  of  the  codewords  are 
1  =  (1,2,3).  Furthermore,  let  the  vector  c  be  channel  coded 
and  transmitted  over  a  noisy  channel. 

III.  Decoder  Structure 

As  the  above  mentioned  trellis  is  terminated  it  can  easily  be 
seen  that  maximum  a  posteriori  (MAP)  decoding  according  to 
the  BCJR-algorithm  [1]  can  be  applied  on  this  trellis.  Thereby 


decoding  can  be  carried  out  either  on  a  symbol-by-symbol 
basis  along  the  horizontal  axis  or  on  a  bit-by-bit  basis  along 
the  vertical  axis.  If  decoding  is  done  vertically  the  output 
values  of  the  decoder  are  a  posteriori  probabilities  (APP)  for 
the  bits  of  the  variable  length  coded  sequence  c.  Let  us  assume 
a  concatenated  coding  scheme  with  a  variable  length  code  as 
outer  code  and  a  channel  code  as  inner  code  separated  by  an 
interleaver.  If  the  APP-VLC  decoder  works  in  the  bit-level 
mode  the  soft-output  can  be  used  as  a  priori  information  for  a 
second  run  of  the  inner  soft-in/soft-out  channel  decoder.  This 
results  in  the  well  known  structure  of  an  iterative  decoder  for 
a  serially  concatenated  coding  scheme. 


Figure  1:  Example  for  VLC-TYellis 


This  new  iterative  approach  in  source/channel  decoding 
with  variable  length  codes  results  in  significant  performance 
gains  compared  to  a  system  with  instantaneous  VLC  decoding 
for  both  AWGN  and  fully  interleaved  Rayleigh-fading  chan¬ 
nel.  Further  detail  about  the  proposed  approach  can  also  be 
found  in  [3]. 
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Abstract  —  We  consider  the  problem  of  joint  source- 
channel  coding  for  progressive  packetized  transmission 
of  an  image  over  a  packet-loss  network  whose  packet- 
loss  rate  is  a  random  variable.  We  obtain  an  algorithm 
for  unequal  erasure  protection  which,  by  design,  main¬ 
tains  progressivity,  that  is,  good  performance  at  inter¬ 
mediate  transmission  budgets. 

Embedded  source  coders  for  images  are  attractive  because 
they  provide  a  single  bitstream  capable  of  progressively  repro¬ 
ducing  the  image  at  different  distortions  and  rates  (bit  bud¬ 
gets).  Maintaining  such  a  progressive  capability,  when  the 
communication  channel  is  noisy  or  lossy,  requires  design  of  a 
joint  source-channel  coding  scheme  which  generates  a  single  bit- 
stream,  simultaneously  taking  into  account  the  distortion-rate 
performance  at  a  number  of  transmission  budgets.  In  this  work, 
we  undertake  the  design  of  a  system  for  progressive  image  trans¬ 
mission  over  a  lossy  packet  network  with  unknown  packet-loss 
characteristics  in  the  absence  of  any  network/transport  layer 
loss  recovery  mechanism  and  feedback  channel  ( e.g .  transmis¬ 
sion  using  User  Datagram  Protocol  (UDP).  We  assume  the 
packet  length  to  be  fixed.  We  model  a  network  with  unknown 
packet-loss  rate  as  a  compound  channel.  A  compound  packet 
erasure  channel  is  a  finite-state  channel  whose  packet-loss  rate 
for  a  session  is  a  random  variable  with  known  probability  dis¬ 
tribution.  In  each  state  s  £  S,  the  channel  is  memoryless  with 
an  associated  packet  erasure  rate  e(s).  The  state  is  chosen  at 
the  beginning  of  the  transmission  session  according  to  a  known 
probability  mass  function  fs,  and  remains  unchanged  during 
the  entire  session.  This  model  applies  to  situations  such  as  1) 
transmission  of  the  same  bitstream  to  different  receivers  facing 
different  packet-loss  rates,  2)  transmission  over  a  time  varying 
channel,  where  the  time-scale  of  variations  in  packet-loss  rates 
is  much  larger  than  the  average  length  of  an  image  transmission 
session  (e.g.  over  the  Internet  where  the  packet- loss  rates  due 
to  congestions  have  hourly  and  daily  variations.) 

We  select  a  high  performance  embedded  image  coder  like 
SPIHT  as  the  source  coder.  The  joint  source-channel  coder  de¬ 
sign  requires  the  allocation  of  unequal  erasure  protection  to  (i) 
incorporate  varying  sensitivity  of  source-bits  to  loss,  and  (ii) 
combat  the  channel  variability.  In  addition,  the  design  requires 
the  scheduling  of  the  source  and  the  protection  bits  in  the  trans¬ 
mit  bitstream  to  obtain  good  progressive  transmission.  We  use 
a  rate  compatible  family  C  =  {ci,C2, .  •  cj}  of  (n,  ko)  packet 
erasure  correcting  (PEC)  codes  obtained  by  puncturing  a  Reed- 
Solomon  (RS)  parent  code  for  different  n  and  fixed  ko-  A  (n,  ko) 
PEC  code  corrects  n-ko  packet  erasures.  The  embedded  source 
coder  output  is  divided  into  fixed  length  source-blocks  of  length 
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ko  packets.  Each  source-block  is  encoded  with  a  potentially 
different  channel  code,  chosen  from  C  according  to  a  code  allo¬ 
cation  policy.  The  joint  source-channel  coder  generates  a  single 
stream  of  packets  and  transmits  over  the  lossy  network.  Some 
of  these  packets  are  lost  or  dropped  by  the  network.  We  assume 
that  the  location  of  dropped  packets  is  known.  At  the  receiver, 
RS  codewords  are  reassembled  and  source-blocks  are  recovered. 
The  image  is  reconstructed  only  from  the  longest  available  (re¬ 
covered)  prefix  of  the  source-block  sequence.  A  code  allocation 
policy  tt  given  by  a  sequence  {e*,  cl, . . . ,  c*  (,r^}  allocates  chan¬ 
nel  code  c‘n  £  C  to  the  ith  source-block  out  of  the  source  coder. 
Let  Mt( tt)  denote  the  total  number  of  packets  used  by  policy 
7T.  Let,  for  c  €  C,  rc(c)  denote  the  coderate.  Then  Mt(tt) 

is  computed  as,  Mt( 7t)  d—  ^  kor^1  (c* ) .  Let  PSNR*  de¬ 

note  the  expected  Peak-Signal-to-Noise-Ratio  (PSNR)  at  the 
receiver  while  following  the  policy  n  over  the  compound  chan¬ 
nel.  The  code  allocation  problem  under  the  constraint  of  total 
transmission  budget  of  M  packets  is  written  as, 

max  PSNR*  subject  to  Mt(tt)  <  M.  (1) 

TT 

Unlike  a  fading  channel,  for  a  compound  channel  and  a  pol¬ 
icy  7r,  PSNRtt  does  not  depend  on  the  order  of  the  packets  in 
the  bitstream.  But  the  expected  PSNR  at  intermediate  budgets 
is  controlled  by  the  order  of  the  packets.  The  proposed  algo¬ 
rithm  addresses  the  simultaneous  ordering  and  code  allocation 
problem.  The  output  bitstream  is  regarded  as  a  sequence  of 
embedded  policies,  each  designed  for  an  intermediate  transmis¬ 
sion  budget  [1].  Let  n*(M)  denote  the  policy  obtained  by  the 
proposed  algorithm  for  problem  (1)  for  packet  budget  M.  The 
algorithm  generates  7 r*(M)  from  policies  7r*  (j)  for  1  <  j  <  M , 
in  such  a  way  that  the  resulting  policies  axe  embedded  by  design. 
Briefly,  the  algorithm  restricts  the  search  of  7 r*(M)  to  p,  union 
of  (i)  all  policies  obtained  by  adding  one  parity-check  packet 
to  a  source-block  in  policy  7 r*(M  —  1)  and,  (ii)  all  policies  of 
packet-constraint  M,  obtained  by  adding  an  entire  source-block 
encoded  with  some  code  c  £  C  to  one  of  the  policies  7t*  (j)  for 
j  <  M.  Then,  from  the  (restricted)  search  space,  it  selects 
the  policy  tt*  ( M )  which  maximizes  the  average  PSNR.  The  al¬ 
gorithm  is  greedy  and  suboptimal,  but  for  each  budget  M,  it 
generates  a  single  packet  stream  which  executes  n r*(M)  as  well 
as  7 r*(j)  for  a  number  of  intermediate  budgets  j,  1  <  j  <  M. 
Simulations  for  selected  channels  and  images  yield  over  0.5  dB 
improvement  in  average  PSNR  over  equal  erasure  protection 
(EEP)  schemes  across  a  range  of  transmission  rates.  The  gains 
are  higher  and  the  designed  allocation  is  farther  away  from  an 
EEP  scheme  if  the  vaxiation  in  the  packet-loss  rates  is  higher. 
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Abstract  —  An  ideal  cipher  system  is  suggested 
whose  coding  time  is  less  in  order  of  magnitude  than 
for  known  methods. 

I.  Introduction 

It  is  well  known  in  cryptography  that  it  is  easy  to  construct 
an  unbreakable  secret-key  cipher  system  if  a  plaintext  source 
generates  letters  which  are  independent  and  equiprobable  even 
if  the  length  of  a  key  sequence  is  much  less  than  the  length 
of  the  message  [1],  In  this  paper,  we  suggest  a  new  secret-key 
cipher  system  in  which  a  message  generated  is  transformed 
into  two  parts  in  such  a  way  that  the  biggest  part  consists 
of  independent  and  equiprobable  bits  and  only  this  part  is 
encrypted.  The  complexity  of  the  method  is  exponentially 
less  than  that  for  other  known  methods. 

We  use  a  common  definition  of  a  secret-key  cipher  system. 
As  customary,  we  assume  that  the  secret  key  k  is  statistically 
independent  of  the  plaintex  sequence  X\X2  ■  •  We  also  assume 
that  the  plaintext  digits,  key  digits,  and  ciphertext  digits  take 
values  in  the  alphabet  A  —  {0,1},  the  key  source  and  the 
plaintext  source  ate  i.i.d.,  and  key  digits  are  equiprobable, 
but  the  suggested  method  is  easily  generalized  for  the  case  of 
any  finite  source  alphabet  and  for  Markov  sources. 

In  this  report  a  simply  realisable  ideal  system  is  suggested 
for  the  case  when  the  length  of  a  key  is  much  less  than  the 
length  of  an  encrypted  message. 

II.  Description  of  the  Cipher  System 

The  description  of  the  suggested  cipher  system  may  be  di¬ 
vided  into  two  parts  as  follows:  first,  a  generated  sequence 
of  letters  is  transformed  into  two  subsequences,  and,  second, 
the  biggest  subsequence  which  consists  of  independent  and 
equiprobable  letters,  is  encrypted.  The  first  part  plays  a  key 
role.  It  is  based  on  the  method  of  P.  Elias  [2]  and  the  fast 
algorithm  of  enumeration  from  [3]. 

Let  us  give  some  new  definitions  in  order  to  describe  the 
method.  Let  be  the  set  of  all  binary  words  of  the  length  n 
with  i  ones,  (n  >  i  >  0),  and  let  for  every  x  €  Sln  code(x)  be 
lexicographical  number  of  the  word  x  in  the  set  S'n  which  is 
written  in  binary  number  system,  the  length  of  code(x)  equals 
n°g(”)l  bits. 

A  generated  plaintext  can  be  written  in  the  form  of  a  se¬ 
quence  of  blocks  of  the  length  n,  where  n  >  2  is  a  param¬ 
eter  of  the  method.  Every  block  x  is  encoded  by  the  se¬ 
quence  of  three  words  u(x)v(x)w(x).  Here  u(x)  is  the  number 
of  units  in  the  block  x  and  the  length  of  u(x)  is  equal  to 
[log(n  +  1)]  bits.  In  order  to  describe  v(x)  and  w(x)  we  de¬ 
fine  mk  =  {log  (k)j(=  [log  |Sn|j)  and  let  . . .  a0  be 

a  binary  notation  of  |S*|.  Let  amk  =1,0,-,  =  1, . . .  ,aj,  =1 
and  the  other  ajk  =  0  and  let  ji  >  >  . . .  >  js.  Let 
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>P{x)  —  PmkPmk-i  ■■■Po  be  the  binary  notation  of  the  lexi¬ 
cographical  number  of  x  and  let  the  following  inequalities  be 
valid: 

amkatmk _i  •  ■  •  a^OOO . . .  0  <  P(x)  ,, 

^  —  1  •  ■  •  00  ...  0 

for  a  certain  r.  (Obviously  such  r  exists).  The  word  w(x)  is 
defined  as  follows 

«,(*)=(  fjr-^-2  ..Po,if  jr-l>0 

W  \  A,  if  Jr  -  1  <  0,  W 

where  A  is  an  empty  word  and  j0  =  m*  by  definition.  At  last, 
v(x)  is  a  binary  notation  of  the  length  of  the  word  w(x )  and 
the  length  of  v(x)  is  equal  to  [log(m,t  +  1)]  bits  by  definition. 

Let  us  describe  the  second  part  of  the  method.  It 
is  convenient  to  enumerate  digits  of  words  w(x i...xn), 
w(xn+\  . . .  Xi  n),  . . .  and  denote  them  by  wqW\  ....  Let  k  = 
kok\  . . .  kt-i  is  the  key  word.  The  enciphering  and  deciphering 
are  defined  by  formulas  2,  =  t Ui®/c,  mod  (  and  Wi  =  Zi®k,  mod  t, 
correspondingly.  Every  symbol  Zi  is  put  on  the  place  of  the 
letter  W{.  So  an  encrypted  sequence  may  be  considered  as 
the  sequence  u(-)v(-)z(-)u(  )v(-)z(  ) .  . .  ,  where  z(-)  is  the  en¬ 
crypted  word  w(-). 

Theorem:  Let  there  be  given  a  Bernoulli  source  which  gen¬ 
erates  letters  from  the  alphabet  A  =  {0, 1}  with  (unknown) 
probabilities  p  and  q,  respectively.  Let  the  suggested  cipher 
system  be  used  for  encrypting  the  source  messages  with  the 
blocklength  n.  Then  the  following  holds: 

i) the  suggested  system  is  strongly  ideal 

ii)  The  symbols  of  the  sequence  ie(a:|")  w(x\’}ln+1)  in(a:|2"+i) 
...  are  independent  and  equiprobable. 

iii)  £'(|ui(x|j^’4f11)'l)|)  >  nh  —  21og (n  +  1),  where  h  — 
—  (plogp  +  <7  log  <7)  is  the  entropy  of  the  source,  E(-)  is  an 
expectation. 

iv)  the  method  requires  the  memory  size  0(n  log  n)  bits 
and  has  the  time  of  encoding  and  decoding  0(log3  n  log  log  n) 
bit  operations  per  letter  as  n  — >  00. 

Remark.  The  source  generates  h  bits  of  information  per  let¬ 
ter  and,  therefore,  nh  bits  per  block.  The  suggested  cipher  sys¬ 
tem  encrypts  symbols  of  the  words  te(z|rn+i  ) ,  r  =  0, 1,  2, . . .. 
The  claim  iii)  shows  that,  informally,  almost  all  generated  in¬ 
formation  is  encrypted  when  n  grows. 
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Abstract  —  A  perfect  homophonic  coding  technique 
is  devised  for  which  the  number  of  fair  coin  tosses  to 
select  a  homophone  is  bounded  for  any  source  with  ra¬ 
tional  letter  probabilities.  The  new  scheme  enlarges 
the  source  alphabet,  which  paradoxically  generally  re¬ 
sults  in  less  plaintext  expansion  than  does  “optimum” 
homophonic  coding  of  the  unaugmented  source. 

I.  Introduction 

In  homophonic  coding,  a  multiplicity  of  “homophones’1  (bina¬ 
ry  words  hereafter)  are  probabilistically  substituted  for  each 
plaintext  letter  so  as  to  reduce  the  redundancy  of  the  result¬ 
ing,  new  “plaintext”  and  hence  to  increase  the  unicity  distance 
of  the  cipher  at  the  cost  of  some  plaintext  expansion.  Ho¬ 
mophonic  coding  is  perfect  if  the  new  plaintext  sequence  is 
irredundant  and  is  optimum  if  it  is  perfect  and  its  plaintext 
expansion  (i.e..  the  average  length  of  a  homophonic  word  less 
the  entropy  of  a  source  letter)  is  as  small  as  possible  [1]. 

For  simplicity,  we  consider  only  binary  homophonic  cod¬ 
ing  of  the  output  sequence  of  a  A’- ary  discrete  memoryless 
source  (DMS).  The  homophonic  coding  problem  then  reduces 
to  that  for  a  single  A'-ary  random  variable  U.  but  the  theo¬ 
ry  is  easily  modified  to  handle  general  sources.  We  assume 
that  U  has  a  probability  distribution  with  rational  entries 
Pp(ut)  =  mi/n,  1  <  i  <  K.  where  m,  and  n  are  positive 
integers  and  n  is  as  small  as  possible.  If  and  only  if  n  is  an 
integer  power  of  2,  the  number  of  fair  coin  tosses  to  select  a 
homophone  is  bounded  for  perfect  homophonic  coding  of  the 
DMS  U  [1],  We  show  now  how  to  achieve  this  for  all  n. 

II.  The  New  Scheme 

The  “trick”  is  to  augment  the  source  U  with  a  “dummy”  let¬ 
ter  A  to  which  we  assign  probability  Py( A)  =  (2^  —  n)/2N 
where  N  =  [log2  n] .  This  forces  the  choices  p&M  = 
( n/2N)Pu(u, )  =  m,/2N  for  1  <  i  <  K.  The  letter  proba¬ 
bilities  for  the  augmented  source  are  thus  rational  numbers 
with  a  common  denominator  of  2N  and  hence  at  most  N  fair 
coin  flips  are  required  to  choose  the  homophone  if  optimum 
standard  homophonic  coding  [1]  is  now  applied.  Surprisingly, 
the  resulting  plaintext  expansion  is  generally  less  than  for  s- 
tandard  “optimum”  homophonic  coding  of  the  original  source. 
Example:  Consider  the  binary  DMS  with  Pp(tti)  =  1/3  and 
Pu(u 2)  =  2/3.  “Optimum”  homophonic  coding  uses  an  un¬ 
bounded  number  of  fair  coin  tosses  for  homophone  selection 
and  gives  an  average  word  length  £[W]  =  2.  The  plaintext 
expansion  is  E[W]  —  H(U)  =  2  —  A ( 1  /3)  S;  1.082,  where  h(p)  is 
the  binary  entropy  function.  N  =  [log2  3]  =  2,  so  we  augmen- 
t  U  with  a  dummy  letter  A  with  Pfj( A)  =  (4  —  3)/4  =  1/4. 
Then  J>&(tii)  =  (3/4) ( 1/3)  -  1/4  and  P0(u,)  =  (3/4)(2/3)  = 
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1/2  so  at  most  two  fair  coin  flips  are  needed  to  select  a  ho¬ 
mophone.  All  letter  probabilities  for  U  are  negative  integer 
powers  of  2  and  hence  E\W]  =  H (U).  The  average  number  of 
letters  from  the  original  source  U  that  are  encoded  with  the 
encoding  of  one  letter  of  U  is  p  =  n/2N  =  3/4.  The  plain¬ 
text  expansion  of  the  new  scheme  is  thus  E\W]  —pH(U)  = 
H(U)  —  pH(U)  =  3/2  —  (3/4)ft(l/3)  fs  0.811,  which  is  sub¬ 
stantially  less  than  the  plaintext  expansion  1.082  for  standard 
“optimum”  homophonic  coding  of  the  original  source  U\ 

The  new  scheme  can  be  implemented  as  follows.  One  first 
tests  for  the  occurrence  of  an  event  of  probability  p  —  1  — 
Pfj(  A)  =  n/2N .  which  requires  at  most  N  flips  of  a  fair  coin. 
If  the  event  occurs,  one  calls  on  the  source  U  to  emit  a  letter 
that  then  becomes  the  output  off/.  Otherwise,  the  dummy 
letter  A  becomes  the  output  of  U.  Decoding  is  simple-one 
just  deletes  the  dummy  letters  from  the  reconstructed  output 
sequence  of  U  to  obtain  the  output  sequence  of  U . 

III.  Bounds  on  Plaintext  Expansion 
Proposition  1  Let  U  be  a  K-ary  discrete  memoryless  source 
whose  letter  probabilities  are  all  rational  numbers ,  and  let  n, 
N  and  p  be  as  defined  above.  Then  standard  optimum  ho¬ 
mophonic  coding  [l]  of  the  augmented  source  U  achieves  a 
plaintext  expansion  £[W]  —  pH(U)  satisfying  the  bounds 

h{p)  <  E[W]—pH(U)  <  h(p)+2—23~N .  all  N  >  3,p  /  1  (1) 

and,  if  a)  the  letter  probabilities  ofU  written  as  fractions  with 
denominator  n  all  have  numerators  that  are  integer  powers  of 
2  and  b)  n  =  2N  —  2'  for  some  i,  0  <  i  <  N  —  2,  satisfying 

E[W]  —  pH(U)  =  h(p).  (2) 

The  lower  bound  in  (1)  follows  immediately  from  the  fact 
that  E[W]  >  H{U)  =  h(p)  +pH(U).  The  upper  bound  for 
N  >  3  follows  from  the  bound  in  [2]  upon  noting  that  when 
p/1  there  can  be  at  most  N  —  1  terms  in  the  expression  for 
the  probability  of  any  letter  of  U  as  a  sum  of  distinct  negative 
integer  powers  of  2. 

The  equality  in  (2)  follows  by  noting  that  conditions  a) 
and  b)  are  necessary  and  sufficient  for  the  probabilities  of  all 
letters  of  U  to  be  negative  integer  powers  of  2  or,  equivalently, 
to  have  A[VF]  =  H(U)  =  h(p)  +pH(U).  Note  that  conditions 

a)  and  b)  are  always  satisfied  when  N  =  2  and  p  1. 
Example:  Consider  the  DMS  U  with  letter  probabilities  2/3, 
1/6  and  1/6.  Here  n  =  6  and  N  =  3.  Conditions  a)  and 

b)  are  satisfied  so  (2)  holds  and  the  plaintext  expansion  is 
E[W] -pH(U)  =  h{p)  =  fc(3/4)  «  0.811. 
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Abstract  —  We  discuss  a  strategy  initiated  by  Boneh 
and  Shaw  for  Collusion-Secure  Fingerprinting.  We 
show  that  under  this  strategy,  finding  fingerprinting 
schemes  that  resist  coalitions  of  two  users  amounts  to 
finding  ^-sequences  of  binary  vectors.  A  sequence  of 
vectors  f  1,1/2, .. . ,  vn  is  a  Ba-sequence  if  all  sums  o,  +  Vj , 
1  <  *  <  j  <  n,  are  different  :  the  associated  extremal 
set-theoretic  problem  is  what  is  the  maximal  size  of 
a  B2-sequence  ?  We  shed  new  light  on  this  old  com¬ 
binatorial  problem  and  improve  on  previously  known 
upper  bounds. 

I.  Marking  Assumptions  and  duplication 

Suppose  a  Distributor  wishes  to  create  and  distribute  a  large 
number  of  copies  of  a  large  binary  file  $  6  F^.  In  order  to 
trace  illegal  copies  he  will  mark  each  copy  of  3>.  The  marking 
process  of  some  copy  of  $  consists  of  changing  the  bits  of  4> 
belonging  to  some  subset  of  a  privileged  set  M  C  {1, ...  N} 
of  coordinates  called  marks.  The  subset  of  marks  associated 
to  a  copy  of  $  is  called  a  fingerprint  and  can  be  seen  as  a 
binary  vector  of  length  m  =  |M|.  The  set  of  marks  M  is 
unknown  to  anyone  but  the  distributor.  It  is  supposed  to  be  a 
small  subset  of  {1, . . .  TV},  so  that  modifying  a  fingerprint  by 
randomly  changing  a  few  bits  of  a  copy  of  $  is  inefficient. 

The  problem  of  collusion  occurs  when  a  coalition  of  c  pirate 
users  compare  their  fingerprinted  copies  :  whenever  their  set 
of  copies  differ  on  some  coordinate  they  will  know  it  is  a  mark. 
They  can  then  produce  an  illegal  copy  by  changing  at  will  bits 
on  the  subset  of  marks  they  have  found  out. 

We  shall  concern  ourselves  with  the  case  c  —  2. 

Boneh  and  Shaw  use  the  following  duplication  trick:  the 
set  of  fingerprints  is  actually  constructed  from  a  code  C  C 
F2  where  m  —  tn.  A  fingerprint  X  is  constructed  from  a 
codeword  x  £  C  simply  by  duplicating  each  symbol  t  times, 
i.e.  changing  0  to  00  -  0  and  1  to  11  ■  •  ■  1.  Let  us  call  the 
set  of  t  coordinate  positions  of  X  that  stem  from  a  single 
coordinate  of  x  a  block.  The  partition  of  the  set  M  of  marks 
into  blocks  is  kept  secret  by  the  distributor.  Thus,  when  two 
pirate  users  compare  their  fingerprinted  copies  they  will  have 
no  way  of  deciding  whether  two  uncovered  marks  belong  to  the 
same  block  or  not,  so  whenever  the  pirates  decide  to  change  a 
fraction  p  of  the  set  of  uncovered  marks,  they  will,  on  average, 
change  a  fraction  p  of  the  marks  belonging  to  any  single  block. 

Suppose  two  colluding  pirates  are  in  possession  of  two  legal 
copies  fingerprinted  by  X  and  Y  and  that  X  and  Y  originate 
from  x,  y  6  C.  The  pirates  have  essentially  the  following  type 
of  strategy  :  they  can  pick  one  of  the  copies,  fingerprinted  by 
X  say,  and  change  randomly  and  independently  with  proba¬ 
bility  p  every  coordinate  of  the  set  of  uncovered  marks.  Their 
only  degree  of  freedom  is  p. 


On  the  other  hand,  when  confronted  with  this  illegal  copy 
of  $  the  distributor  will  try  to  trace  one  of  the  legal  copies  it 
was  constructed  from,  i.e.  reconstruct  x  or  y.  The  distributor 
has  two  strategies: 

1.  He  can  associate  to  every  block  a  binary  symbol  by  ma¬ 
jority  decision. 

2.  The  alternative  strategy  is  to  associate  to  any  corrupted 
block  which  contains  both  zeros  and  ones  a  third  erased 
symbol,  say  e.  This  strategy  yields  a  ternary  vector 

c  e  {0,  I.e}". 

What  duplication  ensures  is  that  the  second  strategy  need 
be  applied  only  when  the  first  has  failed  to  produce  a  legiti¬ 
mate  codeword  z  £  C.  With  high  probability  this  will  happen 
only  when  the  pirates  have  chosen  p  sufficiently  separated  from 
0  and  1.  In  that  case  the  second  strategy  will  yield  with  high 
probability  the  ternary  vector 

C  =  C(*,y) 

defined  by  £  =  x<  =  r/i  when  xf  =  y,  and  by  G  =  e  when 
xi  7^  Vi' 

If  the  code  C  has  the  property  that  C, (x,y)  always  identifies 
{x,y}  for  any  pair  of  codewords  {x,y},  then,  for  any  suffi¬ 
ciently  big  duplication  parameter  t,  one  of  the  two  decoding 
strategies  will  almost  always  identify  x  or  y. 

II.  .^-SEQUENCES 

Let  x  and  y  be  two  codewords  of  a  code  C  C  FJ.  Notice 
that  C(i,y)  is  obtained  from  the  real  sum  x  +  y  by  changing 
every  2  coordinate  into  a  1,  every  1  coordinate  into  e  and 
leaving  0’s  unchanged.  The  relevant  identifying  property  that 
we  need  from  C  is  that  the  real  sums  x  +  y  are  all  different  for 
every  pair  {x,  y}  of  codewords.  Such  a  code  has  been  named 
a  £U-sequence  by  Lindstrom  [2]. 

Define  by  R  =  limsup„_too  A  log2  |C„|  the  maximum  rate 
of  a  B2  -sequence.  The  previously  known  best  bounds  were 
[2]= 

0.5  <  R  <  0.6  (1) 

Using  results  from  [3]  we  obtain: 

Theorem  1 

R  <  0.5752.... 
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Abstract  —  We  present  a  simple  calculus  for  deriv¬ 
ing  conditional  independence  relations  of  events  and 
random  variables  and  show  how  it  can  be  applied  to 
simplify,  generalize  and  sometimes  strengthen  crypto¬ 
graphic  security  proofs  relying  on  the  indistinguisha- 
bility  of  certain  types  of  probabilistic  constructions 
relevant  in  cryptography. 

The  goal  of  our  calculus  is  similar  in  spirit  to  those  of  other 
authors  [3,  2,  1],  but  a  crucial  difference,  important  in  our  ap¬ 
plications,  appears  to  be  that  in  addition  to  random  variables 
occurring  in  the  conditional  independence  relations,  we  also 
consider  events. 

The  core  of  many  classic  security  proofs  of  cryptographic 
systems  relying  on  a  pseudo-random  function  (e.g.  imple¬ 
mented  by  a  block  cipher),  is  a  proof  that  an  idealized  version 
of  the  system,  with  the  pseudo-random  function  replaced  by 
a  random  function,  is  statistically  very  close  to  a  perfect  sys¬ 
tem  modeling  the  desired  ideal  behavior  of  the  system.  More 
precisely,  it  is  proved  that  no  adaptive  distinguisher  algo¬ 
rithm,  even  with  unbounded  computational  resources,  can  dis¬ 
tinguish  the  ideal  and  the  perfect  system  with  non-negligible 
probability,  unless  it  queries  the  system  for  an  infeasibly  large 
number  of  inputs. 

Our  approach  is  based  on  defining  appropriate  conditioning 
events  for  the  idealized  system  such  that  if  the  event  occurs, 
then  it  behaves  exactly  like  the  perfect  system.  In  this  short 
abstract  we  cannot  sketch  the  cryptographic  applications. 

Definition  1  Two  events  A  and  B  axe  conditionally  indepen¬ 
dent,  given  the  event  C,  denoted  [A;  B\C],  if  P(A  D  B  C\  C)  ■ 
P(C)  =  P(A  n  C)  ■  P(B  n  C)  or,  more  briefly,  if 

P(ABC)  ■  P(C)  =  P(AC)  ■  P(BC). 

If  P(C)  >  0,  this  is  equivalent  to  P(AB\C)  =  P(A\C)  ■ 
P(B\C).  The  concept  of  conditional  independence  and  this 
notation  can  be  extended  to  random  variables: 

Definition  2  Let  S,T  and  U  each  be  an  event,  a  random 
variable,  or  a  list  consisting  of  events  and  random  variables. 
S  and  T  are  conditionally  independent  given  U,  denoted 
[5;  7T|C/],  if  the  conditional  independence  relation  according 
to  Definition  1  holds  for  all  possible  triples  of  events  resulting 
when  the  random  variables  in  S,  T  and  U  take  on  particular 
values. 

The  following  theorem  states  under  which  condition  an 
event  or  random  variable  can  be  deleted  from  an  independence 
set  or  the  conditioning  set,  shifted  from  an  independence  set 
to  the  conditioning  set,  or  vice  versa.  Any  random  variables 
in  an  independence  set  can  be  deleted  and,  if  accompanied  in 
the  set  only  by  other  random  variables,  then  it  can  also  be 
moved  to  the  conditioning  set. 

1  Department  of  Computer  Science,  ETH  Zurich,  CH-8092 
Zurich,  Switzerland.  E-mail:  maurer®inf  .ethz.ch. 


Theorem  1  *  Consider  a  fixed  random  experiment  and  let 
S,  T,  U  and  V  each  be  an  event,  a  random  variable,  or  a  list 
consisting  of  events  and  random  variables.  If  [S\T\V],  then 
[S;  U\TV]  and  [S]  TU\V]  are  equivalent,  i.e.,  one  implies  the 


other: 

{S;T|V]  A  [S’;  U\TV]  =>  [S-,TU\V]  (1) 

and 

[S;  T\V]  A  [S;  T77|V]  =k  [S-U\TV].  (2) 

If  U  is  a  random  variable  (or  a  list  of  random  variables),  then 

[S;  TU\V]  =>  [S;T|n  (3) 

[S',TU\V]  =>  [S;U\TV\,  (4) 

and 

[S)T\UV]  A  [S;  U\V]  =►  [S;T|V],  (5) 


Proof.  It  suffices  to  prove  the  first  claim  for  the  case  when 
S,T,  U  and  V  all  are  events.  If  some  of  the  quantities  S,T,  U 
and  V  are  random  variables  or  lists  of  random  variables,  the 
fact  that  the  implication  holds  for  all  events  obtained  by  let¬ 
ting  these  random  variables  take  on  particular  values  implies 
that  it  also  holds  for  the  random  variables.  [S;  T\V]  is  equiv¬ 
alent  to 

P{STV)  ■  P(V)  =  P{SV)  ■  P(TV),  (6) 

[S;  U\TV ]  is  equivalent  to 

P(STUV)  ■  P(TV)  =  P(STV)  ■  P(TUV),  (7) 

and  [S;  TU\ V]  is  equivalent  to 

P(STUV)  ■  P(V)  =  P(SV)  ■  P(TUV).  (8) 

Equation  (8)  is  obtained  from  (6)  and  (7)  by  multiplying  the 
left  side  of  (7)  with  the  left  side  of  (6)  and  the  right  side  of  (7) 
with  the  right  side  of  (6),  and  canceling  the  terms  P(STV) 
and  P(TV)  appearing  on  both  sides.  Similarly,  (7)  is  obtained 
from  (6)  and  (8)  by  multiplying  the  left  side  of  (8)  with  the 
right  side  of  (6)  and  the  right  side  of  (8)  with  the  left  side 
of  (6),  canceling  the  terms  P( V)  and  P(SV)  appearing  on 
both  sides.  The  second  part  of  the  proof  is  omitted.  A 
Note  that  if  S,  T,  and  U  are  events,  then  [S;  TU]  =>- 
[S;T|I7]  and  [S]TU]  ==>  [S;T]  are  false  in  general.  For 
instance,  let  P(S)  —  P(T)  =  P(U)  =  0.5,  P(ST)  = 
P(SU)  =  P(TU)  =  0.2,  and  P{STU)  =  0.1.  Then 
P{STU)  =  P(S)P(TU)  =  0.1  but  P(STU)P(U)  =  0.05  ^ 
P{SU)P(TU)  =  0.04. 
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Abstract  —  We  introduce  a  closed- form  blind  chan¬ 
nel  estimator  for  multiple-input  multiple-ouput  (MI¬ 
MO)  finite  impulse-response  (FIR)  systems,  based  on¬ 
ly  on  second-order  statistics  (SOS).  We  rely  on  cor¬ 
relative  filters  at  each  transmitter  to  induce  a  spec¬ 
tral  assimetry  between  the  users.  No  additional  pow¬ 
er  or  bandwidth,  nor  synchronization  between  the 
sources,  are  required,  and  the  original  data  rate  is 
maintained.  We  show  that,  under  a  simple  spectral 
condition  on  the  transmitted  random  processes,  this 
data  preprocessing  makes  the  MIMO  channel  unique¬ 
ly  determined  (up  to  a  phase  offset  per  user)  from 
the  SOS  of  the  MIMO  system  outputs.  The  closed- 
form  algorithm  which  attains  this  channel  identifiabil- 
ity  bound  is  briefly  discussed. 

I.  Problem  Formulation 

Consider  the  P-input/./V-output  MIMO  system,  y(t)  = 
£p=i  HpSp  {t)  +  w(t)-,  here,  y{t)  €  C",  Hp  €  CNxMp , 
sp  ( t )  =  [sp{t)  Sp{t,  -  1)  •  •  •  Sp(t-  Mp  +  1]T,  Sp(t)  is  the  scalar 
signal  emitted  by  the  pth  user,  and  w(t)  €  CN  denotes  addi¬ 
tive  noise.  Our  goal  is  the  blind  estimation  of  the  MIMO  chan¬ 
nel  matrix  H  =  [Hi  ■■  ■  Hp]  from  the  SOS  of  the  observed 
data  y(t).  We  assume:  (Al)  P  (number  of  users)  is  known 
and  H  is  full  column  rank,  and  (A2)  the  sp(t)’s  are  uncorrelat¬ 
ed  zero-mean  unit-power  wide  sense  stationary  processes,  and 
the  noise  correlation  matrices  Rw(t)  =  E  ^w(t)w  (t  —  r)H  j 
are  known;  sp(t)  and  w(t)  are  independent  processes.  Here, 
for  simplicity,  we  also  assume  (A3)  that  Mi,...  ,Mp  (users’ 
channel  orders)  are  known. 

II.  Channel  Identifiability 

As  it  is  well  known,  the  MIMO  channel  matrix  H  is  not 
unambiguously  defined  from  the  SOS  of  its  outputs,  lZy  = 
{.R^-t)  :  t  E  Z},  if  the  sources  are  white  up  to  2nd  order, 
i.e.,  rSp(r )  =  E  {sp(t)sp(t  -  t)*}  =  6  (r)  (Kronecker  delta). 
To  make  H  identifiable  from  7 Zy,  we  propose  to  color  the 
sources;  i.e.,  the  pth  user,  rather  than  transmitting  the  white 
information  sequence,  say  ap(t),  emits  the  output  of  a  correl¬ 
ative  FIR  filter,  sp(t)  =  fp{l)ap(t  ~  0-  Moreover,  sup¬ 

pose  that  the  correlative  filters  fp  =  (fp( 0), . . .  ,  fP(Lp  —  1)) 
are  designed  as  to  satisfy:  (A4)  for  each  p  ±  q,  there  is  a  cor¬ 
relation  lag  r  =  r(p,q),  such  that  a  (Ap(r))  f]  a  (Aq(r))  =  0. 
Here,  a  (X )  denote  the  spectrum  (set  of  eigenvalues)  of  X, 
and  Ap  (r)  =  RSp  ( 0)~1/ 2  R,p  (r)  RSp  (0)~1/2  is  the  normal¬ 
ized  autocorrelation  matrix  for  the  vector  process  sp(t).  Then, 
we  have  theorem  1. 


Theorem  1.  Under  (Al)-(Af),  H  is  uniquely  determined  (up 
to  a  phase  offset  per  user )  from  the  SOS  of  the  MIMO  system 
outputs  TZy. 

Condition  (A4)  on  the  correlative  filters  is  not  very  restrictive. 
In  fact,  let  Mp  and  Lp  (p  ^  1, . ...  ,  P)  be  given,  where  Lp  >  1 
(i.e.,  each  correlative  filter  fp  has  memory).  Denote  by  Ml 
the  set  of  all  unit-norm  minimum-phase  FIR  filters  of  degree 
L,  let  M  =  rip=i  A4lp  (Cartesian  product),  and  denote  by  T 
the  subset  of  M  which  satisfy  (A4).  Then,  theorem  2  holds. 
Theorem  2.  T  is  dense  in  M. 

Proofs  of  both  theorems  can  be  found  in  [1].  Notice  that 
unit-norm  correlative  filters  Eire  required  in  order  to  maintain 
the  original  transmitted  power.  The  minimum-phase  property 
is  desirable  as  it  permits  to  estimate  the  information  symbols 
a p(t)  by  directly  inverting  the  filters  fp  (once  H  is  identified). 

III.  Blind  Identification  Algorithm 

We  just  outline  the  algorithm  three  main  steps  (for  details, 
see  [1]).  We  exploit  theorem  1  as  the  basis  of  our  identification 
strategy.  It  guarantees  that,  if  G  :  N  x  M  (M  =  £p=1  Mp) 
satisfies  the  equations  Ry(r)  =  GR,(t)Gh  +Rw(t )  (r  €  Z), 
then  G  —  H  (up  to  a  phase  offset  per  user);  here,  R,(t) 
are  the  correlation  matrices  of  s(t)  =  [si (t)T  ■  ■  ■  sp(t)T]  . 
Since  the  MIMO  system  is  FIR,  we  only  have  to  consider  a 
finite  number  of  equations,  say  for  r  6  T  =  {n,...  ,r*}. 
Let  R(t)  =  Ry(r)  —  R„, (r)  (denoised  output  correlation 
matrices).  Step  T.  we  obtain  Go  =  H R,(Q)1^2 QH ,  where 
Q  =  •  •  ■  Qp]  :  (unknown)  unitary,  as  the  square-root 

of  R(0).  Step  2:  Focus  on  the  pth  user,  and  let  B(t)  = 

Gq  r(t)Gq  h  =  EP=i  Qpaf  (r)  Q” ■  Due  to  the  unitary 

structure  of  Q,  Qp  satisfies  the  linear  system  (in  the  unknown 
X)  C:  B(t)X  -  XAp(t)  —  0,  r  €  T.  It  turns  out  that,  due 
to  (A4),  Qp  is  the  unique  solution  (within  a  scalar  factor). 
Thus,  solving  C  and  re-scaling,  yields  Up  =  Qpel6p .  Step  3: 
Let  U  =  [Gi  ■  •  •  Up].  Then,  G  =  G0UR,( 0)“1/2  is  a  copy  of 
H  (up  to  a  phase  offset  per  user). 
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Abstract  —  A  novel  technique  for  the  blind  source 
separation  (BSS)  of  mutually  independent  and  iden¬ 
tically  distributed  i.i.d.  discrete-time  sequences  is 
presented.  The  observed  signals  are  assumed  mixed 
through  a  narrow-band  (memoryless)  multiple-input- 
multiple-output  (MIMO)  noisy  channel  and  are  then 
processed  by  a  linear  MIMO  receiver,  whose  outputs 
should  ideally  match  the  transmitted  signals.  In  the 
proposed  approach  (called  the  Multi-User  Kurtosis 
(MUK)  algorithm),  the  linear  receiver’s  matrix  set¬ 
ting  is  computed  adaptively  based  on  the  optimiza¬ 
tion  of  a  constrained  statistical  criterion  that  involves 
only  second  and  fourth  order  statistics  of  the  re¬ 
ceiver’s  output.  At  each  iteration,  the  algorithm  com¬ 
bines  a  stochastic  gradient  adaptation  with  a  Gram- 
Shmidt  orthogonalization  that  enforces  its  criterion’s 
constraints.  The  analysis  of  its  stationary  points  (pre¬ 
sented  in  [1],  [2]),  reveals  that  it  is  globally  convergent 
to  a  zero  forcing  -ZF  (or  decorrelating)  solution,  both 
in  the  absence  of  noise  and  in  the  presence  of  spatio- 
temporally  white  additive  Gaussian  noise. 


k  =  0:  initialize  W(0)  =  W0 
for  k  >  0 

Obtain  W^fc+l)  from  (2) 

Obtain  W^k+l^WKk+V/WWKk+^W 
for  j  =  2  :p 

Compute  Wj(fc+1)  from  (3) 

Go  to  5 

w (k  +  1)  =  [Wi(k+1)  •  •  •  Wp(k+ 1)] 

Go  to  2 


Table  1:  The  MUK  algorithm 


first  updates  the  receiver  matrix  through  the  following  recur¬ 
sion 

W'{k+l)=W{k)  +  p  sign(Ka)  Y”(k)Z{k)  (2) 

where  p,  is  the  stepsize  (a  small  positive  scalar),  Ka=K(ai(k)), 
Z(k)=[\zi(k)\2zi(k)  ■  ■  ■  \zp(k)\2 zp(k)],  (*  denotes  complex 

conjugate),  and  then  it  projects  each  column  of  W'(fc+1)  to 
the  corresponding  column  of  W(&+1)  through 


I.  Summary 

We  consider  the  standard  instantaneous  mixture  BSS  prob¬ 
lem:  p  i.i.d.  and  mutually  independent  zero-mean  discrete¬ 
time  sequences  ai(k),  l  =  1, . . .  ,p,  that  share  the  same  pdf, 
are  transmitted  through  a  p  x  q  MIMO  linear  memoryless 
channel.  The  received  signal  model  then  takes  the  familiar 
form  Y(k)  =  H A(k)  +  n (k),  where  A(k)  =  [ai(fc)  •  •  •  ap(k)]T 
is  the  p  x  1  vector  of  source  signals,  H  is  the  q  x  p  channel 
matrix,  Y ( k )  is  the  q  x  1  vector  of  received  signal  snapshots, 
n (k)  is  the  q  x  1  vector  of  additive  noise  samples,  all  at  time 
instant  k,  and  T  denotes  matrix  or  vector  transpose.  The  re¬ 
ceived  vector  signal  Y ( k )  is  subsequently  filtered  by  a  q  x  p 
“spatial  equalizer”  W  which  produces  the  pxl  vector  output 
z(fc)  =  [zi(A;)  zP(A;)]r.  The  vector  output  z(k)  is  hence 

given  by  z(k)—'WTY(k)—'WT'H.A(k)+n'(k)=GTA(k)+n'(k), 
where  G  =  HTW  is  the  p  x  p  global  response  matrix  and 
n'(A:)  =  WTn (k)  is  the  filtered  noise  at  the  receiver  output. 

The  MUK  algorithm,  which  is  presented  in  [2],  is  derived 
from  the  following  optimization  criterion 

'  v 

A  “g*  f(g)  =  52i^(*j)i  (1) 

subject  to:  GhG  =  Ip 

where  K(x)—E  (|z|4)  -2 E2  (|z:|2)  -|U  (a:2)  |2  is  the  kurtosis 
of  x,  Ip  is  the  pxp  identity  matrix  and  H  denotes  Hermitian 
transpose  (we  also  assume  cr2=E(|ai(A;)|2=l).  The  algorithm 
requires  the  received  signal  Y  ( k )  to  be  spatially  pre-whitened, 
corresponding  to  a  unitary  channel  H.  At  each  iteration,  it 


j-i 

W)(k+ 1)  -  (k+l)w;(k+l))Wi(k+l) 

Wj(k+1)= - -j=L - 

ii  w;(k+i)  -  ^2(w,H(k+i)w;(k+i))w,(k+i)\\ 

i=i 

(3) 

where  ||A||2  =  XH X.  The  resulting  MUK  algorithm  is  de¬ 
scribed  in  Table  1.  In  [1],  [2],  the  following  theorem  was  shown 
regarding  the  convergence  of  the  MUK  algorithm: 


Theorem  1  Both  in  the  absence  of  noise  and  in  the  presence 
of  additive  noise  with  mutually  independent  i.i.d.  and  circu¬ 
larly  symmetric  Gaussian  components  (of  variance  cr2  each), 
and  assuming  that  Y(k)  is  perfectly  pre-whitened  (correspond¬ 
ing  to  a  unitary  H),  the  only  maxima  of  the  MUK  algorithm 
correspond  to  a  decorrelating  (zero-forcing)  detector,  i.e.  to  a 
solution  of  the  following  type 


g  =  $n  (4) 

where  $=diag([e*<t>1  ■  ■  ■  e®^p]),  i=\f- <j>\, . . . ,  <j>p,  are  arbi¬ 
trary  phases,  and  II  is  a  p  x  p  permutation  matrix. 
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Abstract  —  We  apply  a  novel  signal  processing 
method  based  on  Independent  Component  Analysis 
(ICA)  to  blind  multiuser  receivers.  ICA  is  well  suited 
for  blind  multiuser  detection  problems  as  the  criterion 
used  to  separate  signals  is  a  mutual  information  min¬ 
imization  principle  which  attemps  to  separate  inde¬ 
pendent  signals  from  mixed  signals.  When  the  cross¬ 
correlations  between  signature  sequences  are  big,  ICA 
has  better  performance  than  decorrelating  receivers 
and  linear  MMSE  receivers. 

I.  CDMA  System  Description 

We  consider  the  following  CDMA  system  where  the  signal 
at  a  given  receiver  consists  of  the  sum  of  N  transmitted  user 
signals  embedded  in  additive  white  Gaussian  noise  [2]. 

N  M 

y(<)  =  ^  ^  AkbkSk{t  -  iT  -  rk)  +  <rn(t)  (1) 

k=  1  t  =  -M 

where  Ak  is  the  received  amplitude  of  the  fcth  user’s  signal, 
and  bk  €  {  — 1,+1}  is  the  bit  transmitted  by  the  fcth  user. 
Sk  is  the  deterministic  signature  sequence  assigned  to  the  fcth 
user.  The  length  of  the  packets  transmitted  by  each  user  is 
2 M  +  1.  Tk  €  [0,T)  is  the  delay  of  the  fcth  user  and  rk  =  0 
corresponds  to  a  synchronous  channel  model.  n(t)  is  white 
Gaussian  noise  with  unit  power  spectral  density  and  a  is  the 
noise  deviation. 

II.  Independent  Component  Analysis 

Independent  Component  Analysis  (ICA)  is  a  linear  transfor¬ 
mation  of  data  such  that  the  elements  become  statistically 
independent.  ICA  is  well  suited  for  blind  multiuser  detec¬ 
tion  problem.  In  CDMA  communications,  user  signals  are 
independent  from  each  other.  The  channel  output  is  the  lin¬ 
ear  mixture  of  multiuser  signals  and  additive  white  Gaussian 
noise. 

X  =  As  (2) 

where  X  is  the  channel  output,  A  is  a  mixing  matrix,  and  s 
is  a  vector  containing  original  user  signals  and  additive  white 
Gaussian  noise.  ICA  will  determine  a  weight  vector  such  that 

s  =  WX  (3) 

here  s  is  the  estimate  of  independent  source  signals  and  the 
components  of  s  are  called  Independent  Components  (IC).  In 
order  to  make  the  components  of  i  as  independent  as  possible, 
ICA  will  find  a  linear  transformation  that  can  minimize  the 
output  mutual  information  [1]. 

1  This  work  is  supported  in  part  by  NSF  Grant  No.9625557. 
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III.  XICA  Algorithm  and  Simulation  Results 

The  XICA  algorithm  combines  the  fixed-point  algorithm 
proposed  by  Hyvarinen  [1]  and  a  detection  scheme.  After  the 
fixed-point  algorithm  converges,  we  can  get  the  demixing  ma¬ 
trix  W  and  by  using  the  detection  scheme  that  we  propose,  we 
can  separate  the  desired  user  signals  from  others.  The  detec¬ 
tion  scheme  works  as  follows:  First,  calculate  the  correlation 
between  the  desired  user  signature  sequence  and  each  column 
of  A  which  is  the  inverse  of  W  matrix.  Then  take  the  IC  with 
the  largest  absolute  correlation  and  this  IC  is  the  desired  user 
signal. 

If  we  use  Gold  code  as  spreading  sequences,  the  perfor¬ 
mance  of  ICA  is  similar  to  the  linear  MMSE  receiver  and  the 
decorrelating  receiver.  By  using  a  set  of  spreading  sequences 
that  have  bigger  correlations  and  smaller  processing  gains, 
ICA  performs  better  than  the  other  two  linear  receivers.  We 
consider  a  synchronous  five-user  Gaussian  channel.  All  users 
have  equal  energy,  the  correlation  coefficient  is  0.2,  and  the 
spreading  gain  is  5.  Figure  1  shows  that  ICA  performs  better 
than  the  linear  MMSE  and  the  decorrelating  receiver.  Since 
Gassian  noise  is  independent  of  all  user  signals  and  it  is  sym¬ 
metric  in  all  directions,  ICA  can  mitigate  the  noise  by  extract¬ 
ing  user  signals  first  and  leaving  noise  behind. 
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Abstract  —  We  consider  blind  adaptive  multiuser  detection 
in  Correlated  Waveform  Multiple  Access  (CWMA)-based  cellu¬ 
lar  radio  networks.  A  common  stochastic  approximation  (SA) 
based  framework  is  proposed  from  which  three  blind  adaptive  al¬ 
gorithms  for  linear  MMSE  detection  are  obtained.  Two  of  them 
coincide  with  previously  proposed  algorithms  and  the  third  is 
shown  to  be  best  suited  for  implementation  at  a  base  station.  Im¬ 
provement  in  terms  of  convergence  properties  of  these  SA-based 
adaptation  algorithms  is  sought  by  using  the  more  recent  results 
on  the  SA  technique  with  averaging. 

I.  System  Model 

We  consider  a  cellular  network  model  in  which  there  are  B  base 
stations  with  Kj  users  assigned  to  base  j.  While  the  transmissions  of 
out-of-cell  users  are  received  symbol-asynchronously  at  a  base  sta¬ 
tion,  it  is  assumed,  for  the  sake  of  simplicity,  that  in-cell  users  are 
symbol-synchronous.  A  base  station  is  assumed  to  have  knowledge 
of  the  (common)  timing  of  the  received  signals  of  only  the  users  in  its 
own  cell.  For  simplicity,  we  assume  binary  antipodal  signalling. 

The  discrete-time  model  for  the  Nj  matched  filter  outputs  at  base 
j  can  be  expressed  as 

K,  BK, 

y j  =  +stjbn )  +xj,  (i) 

1=1  lAji=  1 

where  the  channel  gain  to  base  j,  the  transmit  power  and  the  trans¬ 
mitted  symbol  of  the  i,h  user  of  base  /  are  denoted  by  guj,  w,;  and 
bn,  respectively,  sy  denotes  the  vector  representation  (the  “signa¬ 
ture  sequence”)  of  the  signal  of  user  i  of  base  j  with  respect  to 
a  set  of  orthonormal  basis  functions  that  spans  at  least  the  in-cell 
signal  subspace.  For  the  same  basis  functions,  the  vectors  s 7{.  and 
sfjj  denote  the  segments  of  the  signals  associated  with  the  two  sym¬ 
bols  of  user  i  of  base  l ,  b~t  and  bft,  respectively,  that  overlap  with 
the  symbol  of  interest  at  base  j.  %j  is  an  Ay-dimensional  zero- 
mean  Gaussian  random  vector  with  a  covariance  matrix  equal  to 
aj I.  For  base  j,  let  us  denote  the  signal  matrix  for  in-cell  users  as 
S j  =  [sijS2j  ■  •  -  sjC/y]  and  the  diagonal  matrix  of  in-cell  user  energies 
as  Wj  =  diag  [wx jgijj,-", w KjjgKjjj ] •  With  i  =  1 , •  •  • , Kt  and  l  ±  j, 
the  out-of-cell  signal  and  user  energy  matrices  will  be  denoted  as 
S7  =  [{%}]’  Sj  =  [{s//j}]  and  Wy  =  diag  [{w,7g,7j}],  respec¬ 
tively. 

II.  Blind  Multiuser  Detection  with  Averaging 

We  will  consider  detection  at  base  station  1.  The  linear  MMSE 
multiuser  detector  for  user  k  is  given  (with  suitable  scaling)  as  the 

unique  solution  to  the  equation:  Ac*]  =  s*j,  where  A  =  S]W]S^  + 

sj-w^sr^+s+w^s^+o?! 

From  the  theory  of  iterative  methods  to  solve  linear  equations,  we 
can  form  the  following  general  deterministic  iteration 

c*i  (n)  =  (I— p„QA)c*](n— l)+p„Qs*i,  (2) 

'This  work  was  supported  in  part  by  NSF  Grant  NCR-9725778  and  ARO 
grant  DADD  19-99-1-0291. 


that  converges  to  the  desired  c*j .  Q  is  a  nonsingular  matrix,  whose 
inverse  is  called  the  splitting  matrix.  Replacing  Q  by  the  identity 
matrix  and  A  by  its  instantaneous  stochastic  estimate  yi(n)yi(n)r, 
leads  to  the  stochastic  approximation  based  algorithm  in  [1].  Fur¬ 
ther,  using  the  canonical  representation  for  the  (scaled)  MMSE  linear 
detector,  r  L,  ctl  =  su  +  p*i  ,  p*i  1  s*,  ,  we  can  replace  Q  by 

=  I  —  s*](s^1s*])_1s*|  (note  that  in  this  case,  Q  is  singular)  to 
adaptively  estimate  p*i : 

S*1  +P*l(»)  =  (I-HnP£1yi(n)yi’(«))(s*i  +P*l  (n—  !))•  (3) 

>  0  is  a  suitably  chosen  fixed  or  decreasing  step-size  sequence. 
The  rule  in  (3)  can  be  shown  to  be  identical  to  the  one  in  [2],  where 
it  was  derived  differently  by  minimizing  the  output  energy.  Both  the 
recursions  mentioned  above  are  based  on  the  knowledge  of  only  each 
user’s  own  signal,  i.  e.,  s*i . 

Defining  B  =  S]  WjS^  +  0]I,  we  observe  that  in  a  cellular  system 
where  the  out-of-cell  interferers  are  typically  weak,  B  can  be  consid¬ 
ered  as  a  course  approximation  of  A.  Therefore,  with  the  knowledge 
of  the  signals  of  the  in-cell  users,  their  energies,  and  the  noise  vari¬ 
ance,  we  replace  Q  by  B_1  to  obtain: 

c*i(rt)  =  (I  — /y„B_1y1(n)yi(n)r)c*1(n-  1) +//„B~1s*1.  (4) 

This  algorithm  is  seen  to  converge  more  quickly  to  the  MMSE  solu¬ 

tion  than  its  single-signal  based  counterparts. 

.  A  recent  fundamental  development  in  stochastic  approximation  is 
the  idea  of  averaging  as  introduced  for  multidimensional  problems  in 
[3],  The  stochastic  version  of  the  general  deterministic  rule  in  (2)  can 
be  modified  to  include  an  averaging  step  after  the  “basic”  recursion: 

c*i  (n)  =  (I-p«Qyi(n)yi(«)7’)c*i(n-l)+^Qs*1,  (5) 

c*i(«)  =  -£c*'(0-  (6) 

i=i 

The  “smoothing”  effect  of  the  averaging  allows  the  basic  recursion 
step  to  use  “larger”  step-sizes  than  would  be  feasible  for  the  non- 
averaged  adaptive  rule  leading  to  an  improvement  in  convergence. 
Analytical  convergence  for  (3)  is  shown  in  a  manner  different  than  in 
[2],  The  adapatations  with  averaging  are  shown  to  converge  with  zero 
asymptotic  mean  squared  error  under  the  assumption  of  a  completely 
synchronous  system  and  almost  surely  for  the  asynchronous  model. 
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Abstract  —  In  this  paper,  we  investigate  a  sub  op¬ 
timum  version  of  the  iterative  multistage  maximum 
likelihood  algorithm  presented  in  [2].  Application  to 
Block  Coded  Modulation  is  considered.  Conditions 
are  introduced  which  reduce  the  computational  com¬ 
plexity  and  decoding  delay.  Simulation  results  sup¬ 
port  the  claims. 

I.  Summary 

Decomposable  and  multilevel  codes  [1],  such  as  block  coded 
modulation  (BCM)  codes,  can  be  efficiently  decoded  by  mul¬ 
tistage  decoding(MSD)  algorithm.  In  the  conventional  MSD, 
the  reduction  of  complexity,  as  compared  to  maximum  likeli¬ 
hood  decoding  (MLD)  algorithms,  is  achieved  at  the  expense 
of  increased  error  rate.  For  short  codes,  the  performance 
degradation  is  small.  The  discrepancy  between  the  optimum 
and  MSD  algorithms  becomes  apparent  when  long  codes  are 
used. 

The  iterative  multistage  (IMS)  MLD  [2]  can  be  used  to 
obtain  the  optimum  performance  for  a  given  code.  This  algo¬ 
rithm  achieves  MLD  through  iterations  with  optimality  tests 
at  each  decoding  stage.  In  [3],  another  MSD  algorithm  for 
decoding  multilevel  codes  based  on  list  decoding  of  the  outer 
codes  was  presented.  The  improvement  is  achieved  by  pass¬ 
ing  additional  estimates  from  the  first  to  the  second  decoding 
stage  when  the  distance  of  the  decoded  codeword  from  the 
received  sequence  is  larger  than  a  given  threshold. 

The  algorithms  presented  here  combine  those  in  [2]  and  [3]. 
Let  ED(,)’‘'er  denote  the  squared  Euclidean  distance  between 
the  received  sequence  (rai,rwi)i=i>2,...jv  and  the  estimate  at 
i-th  decoding  stage  of  iteration  iter.  The  optimum  version 
of  IMS-MLD  is  based  on  two  simple  facts:  1)  ED^,Uer  < 
ED(i+j),<»«r  and  ED(i)'i,ep  <  ED«'iter+J  for  j  >  0;  2)  if  the 
constellation  label  sequence  at  the  first  stage,  CLS^,  is 
a  codeword  in  the  BCM  code,  then  no  further  estimate  will 
be  closer  to  received  sequence  than  CLS^1)  is. 

In  the  suboptimum  version,  threshold  decoding  is  used  to 
reduce  the  number  of  iterations.  Due  to  new  criteria  in¬ 
troduced  as  a  modification  of  [2],  optimality  is  not  claimed. 
However,  the  simulations  show  that  properly  chosen  values  of 
thresholds  can  lead  to  good  error  performances.  This  algo¬ 
rithm  is  particularly  useful  when  long  codes  are  used  as  outer 
codes  in  BCM  schemes.  Both  algorithms  are  given  in  Fig¬ 
ure  1,  where  dashed  lines  denote  modifications  that  lead  to 
suboptimum  version. 

Examples  of  several  codes  will  be  presented  to  show  the 
performance  vs.  decoding  complexity. 

1This  research  was  supported  by  the  National  Science  Foun¬ 
dation  under  Grants  No.  NCR-94-15374  and  NCR-97-32959,  and 
NASA  under  Grant  NAG  5-931. 


References 

[1]  H.  Imai  and  S.  Hirakawa,  “A  new  multilevel  coding  method  us¬ 
ing  error  correcting  codes.”  IEEE  Transactions  on  Information 
Theory,  vol.  IT-23,  no.  3,  pp.  371-376,  May  1977. 

[2]  D.  Stojanovic,  M.P.C.  Fossorier  and  S.  Lin  “Iterative  Multistage 
Maximum  Likelihood  Decoding  of  Multi-Level  Codes"  Proceed¬ 
ings  of  the  1999  Conference  on  Coding  and  Cryptography,  Paris, 
France  January  10-14,  1999. 

[3]  U.  Dettmar,  J.  Portugheis  and  H.  Hentsch  “New  Multistage 
Decoding  Algorithm”  Electronic  Letters,  vol.  28  No.  7  pp,  635- 
636,  1992. 


Fig.  1:  Flowchart  of  the  Iterative  Decoding  Algorithm  for  3-level 
code 
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Abstract  —  We  investigate  the  application  of  the 
sum-product  algorithm  to  the  decoding  of  a  q- ary 
Block-Coded  Modulation  (BCM)  scheme  which  is 
based  on  extending  the  parity  check  equations  of  a  bi¬ 
nary  block  code  to  q- ary  symbols.  This  is  achieved  by 
decomposing  the  code  into  a  sub-code  with  an  acyclic 
Tanner  graph  and  its  cosets  which  are  represented  by 
a  trellis  diagram.  The  combination  of  these  two  cycle- 
free  graphs  are  used  to  develop  an  efficient  soft  output 
decoding  algorithm  for  the  given  code. 

I.  Introduction 

Our  objective  has  been  to  develop  a  soft  output  decod¬ 
ing  method  for  the  BCM  codes  proposed  in  [1],  The  cor¬ 
responding  construction  is  based  on  extending  good  binary 
block  codes  from  GF( 2)  to  Zq.  They  assume  a  g-PSK  sig¬ 
nal  constellation  where  the  components  of  the  q- ary  code  are 
directly  mapped  to  the  g-PSK  points  using  an  appropriate 
labeling.  The  extension  of  the  binary  linear  code  to  a  g-ary 
linear  code  is  based  on  extending  the  parity  check  equations 
to  {0,l,...,g  —  1},  mod  g  constraints.  In  this  case,  the  en¬ 
coder  inputs  log(g*)  =  k  •  log(g)  bits  and  outputs  a  length 
n  codeword  of  elements  of  Zq  —  {0,1,  ...,g  —  1}  which  Eire 
each  mapped  to  the  points  of  a  g-PSK  constellation.  The 
resulting  scheme  is  2n-dimensional  with  a  minimum  time  di¬ 
versity  of  MTD=  d,  and  Band- Width- Efficiency  (BWE)  of 
g  =  k  -log(g)/n  =  R  -log(g)  bits/2-D  symbol  ( R  =  k/n  is  the 
binary  code  rate) .  Therefore  the  optimality  of  these  codes  for 
a  Rayleigh  fading  channel  in  terms  of  MTD  and  BWE  is  tan¬ 
tamount  to  that  of  the  underlying  binary  block  code.  These 
schemes  fall  into  the  category  of  codes  over  rings  and  groups 
which  recently  have  received  a  lot  of  attention  among  coding 
theorists  [2]. 

II.  Decoding 

Consider  the  communication  system  in  Figure  1  where  k 
information  bits  u  =  (u, ,  U2,  ...,Uk)  sire  first  encoded  to  n 
channel  symbols  x  =  (xi ,  x2, ...,  xn)  and  then  transmitted 
through  the  channel  which  outputs  y  —  (yi,  J/2,  Chan¬ 

nel  is  memoryless  such  that  each  channel  output  y;  is  only 
related  to  the  channel  input  at  the  same  time,  namely,  a:;, 
by,  y,-  =  a,  •  Xi  +  n;,  where  a;  is  1  for  an  AWGN  channel 
and  Rayleigh-distributed  for  a  Rayleigh  fading  channel.  For  a 
probability  propagation  decoding,  one  can  construct  a  proba¬ 
bilistic  model  for  the  system  by  examining  the  encoding  pro¬ 
cess  and  the  channel.  Then,  a  soft  output  decoding  method 

1This  work  was  supported  by  Natural  Sciences  and  Engineering 
Research  Council  of  Canada  (NSERC). 

2This  work  was  supported  by  Communications  and  Information 
Technology  Ontario  (CITO). 


Figure  1:  General  Memoryless  Coding  Channel. 


that  maximizes  Pr(z;  |  y),  i  =  1,  2, . . . ,  n,  will  minimize  the 
symbol  (g-ary)  error  probability.  The  sum-product  algorithm 
provides  an  efficient  way  for  calculating  such  marginals  using 
a  graphical  representation  of  the  code  [3,  4]. 

In  [1],  a  2-level  decoding  method  based  on  a  generalization 
of  the  method  discussed  in  [5]  is  proposed.  To  decode  the  con¬ 
structed  BCM  scheme,  the  codebook  which  has  a  cyclic  Tan¬ 
ner  graph  (TG)  is  decomposed  to  a  sub-code  with  an  Acyclic 
Tanner  Graph  (ATG),  and  its  cosets.  The  significance  of  rep¬ 
resenting  the  code  by  an  ATG  is  that,  one  can  use  a  general¬ 
ization  of  the  well-known  Wagner  rule  for  their  decoding.  A 
composite  Tanner  graph- Trellis  (TG-T)  is  used  to  represent 
the  code  structure. 

On  the  other  hand,  it  is  well  known  that  the  probability 
propagation  algorithms  for  soft  output  decoding,  e.g.,  BCJR 
algorithm,  can  be  used  on  a  cycle-free  graph  to  produce  an 
exact  probability  calculation  of  code  symbols.  Examples  of 
such  cycle  free  graphs  include  a  trellis  representation  and  a 
cycle-free  Tanner  graph.  The  focus  of  the  current  article  is  to 
use  the  TG-T  representation  of  the  code  (which  is  based  on 
the  combination  of  two  cycle  free  graphical  representations) 
to  produce  an  efficient  soft  output  decoding  method. 

The  bit  error  performance  of  the  resulting  code  construc¬ 
tion  will  depend  on  the  method  used  for  the  bit  labeling  of  the 
underlying  g-PSK  constellation.  We  will  present  a  discrete  op¬ 
timization  method  to  optimize  such  labeling  to  minimize  the 
resulting  bit  error  probability. 
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Abstract  —  In  this  paper,  we  apply  the  iterative 
Viterbi  algorithm  (IVA)  to  decode  a  concatenated 
multidimensional  TCM  in  which  a  trellis  code  is  used 
as  the  inner  code  and  a  simple  even  parity  code  is 
used  as  the  outer  code. 

I.  Introduction 

In  this  work,  we  extend  the  iterative  Viterbi  algorithm 
(IVA)  [2][3]  for  concatenated  multidimensional  (MD)  trellis 
codes.  With  a  simple  BCH  code,  at  BER  —  4.4  x  10-6  about 
2.2  dB  additional  net  gain  can  be  achieved  using  the  IVA  for 
the  4D  16-st.at.e  Wei’s  code  [1]  at  a  spectral  efficiency  of  7 
bit.s/T. 

II.  Encoding  the  Concatenated  MD  TCM 

In  Fig.  1,  there  are  (m  —  1)  information  streams  organized 
into  a  block  of  (m  —  1)  rows.  The  mth  stream  called  the 
parity-check  (PC)  stream  is  then  generated  in  such  way  that 
the  trellis-encoded  bits  in  the  rnth  stream  will  be  the  parity 
of  the  trellis-encoded  bits  of  the  (m  —  1)  information  streams, 
i.e,  =  ££  j  ©/*„•  The  non-trellis-encoded  bits  in  the 
rrith  stream  are  intact. 

We  noted  that,  the  operation  of  differential  encoder  in  Wei’s 
code  design  is  a  nonlinear  operation.  Therefore,  t.o  prevent  the 
PC  property  being  violated  after  the  differential  encoding,  we 
impose  the  PC  condition  on  the  trellis-encoded  bits  of  all  the 
rn  streams  only  after  the  differential  encoding. 


In  the  IVA,  the  likelihood  function  of  the  4D  type  in  the 
(rn.  —  l)**1  stream  at  time  t  is 


al 


=  -log  [P(R\,Rl---,Rtm\Zln_1,Zt+2l)\  (1) 

*  ^1+A‘W,. 


where  aZ\  denotes  the  branch  metric  value,  which  is  iden¬ 
tical  to  the  metric  used  in  the  VA,  and  A^/_\  denotes  the 
extrinsic  metric  value  introduced  by  the  PC  condition  from 
the  other  streams.  The  metric  function  A^t  is  equal  to  the 
metric  used  in  the  VA  for  the  mth  stream,  but  it  is  “con¬ 
trolled”  by  the  estimated  PC  condition 
where  /,•  is  the  decision  of  Ij  in  the  previous  iteration. 


IV.  Numerical  Results 

Figure  2  presents  the  performance  of  the  4D  16-stat.e  Wei’s 
code  using  the  2D  IVA  with  and  without  BCH  code  at  a 
spectral  efficiency  of  7  bits /T.  It  is  shown  that  at  BER  = 
4.4  X  10-6  level  about  2.7  dB  gross  gain  or  2.2  dB  net  gain  is 
achieved. 
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Figure  2:  Performance  of  the  4D  16-state  Wei’s  codes  using 
the  VA  and  2D  IVA  at  a  spectral  efficiency  of  7  bits/T 
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Figure  1:  ID  structure  consisting  of  the  rn  streams 


III.  Decoding  the  Concatenated  MD  TCM 
Using  the  IVA 

Let.  R\  (x  =  1,.2,  ••■,m)  denote  t.he  received  4D  signals  of 
the  rn  streams  at  time  t.  Let,  Z\  and  Z*+1  denote  the  2D 
encoded  codewords  of  the  ith  stream  in  the  first,  and  second 
2D  sub-constellations  at  time  f,  respectively. 


V.  CONCLUSIONS 

Significant  gains  for  concatenated  4D  trellis  codes  using 
the  IVA  over  the  use  of  the  VA  can  be  obtained  with  low 
complexity  and  reasonable  computation.  The  cases  of  the  8D 
and  higher  dimensional  concatenated  TCM  can  be  obtained 
through  the  similar  way.  More  results  about  the  performance 
of  other  decoding  algorithms  can  be  found  in  [4]. 
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Abstract  —  A  multilevel  coding  approach  to  the  con¬ 
struction  of  multidimensional  (MD)  GU  trellis  codes 
is  considered.  We  present  a  family  of  GU  trellis  codes 
with  good  trade-off  between  SNR,  decoding  complex¬ 
ity  with  stage  decoding,  and  coding  rate  near  the  cut¬ 
off  rate. 

I.  Introduction 

For  a  transmission  rate  of  hn  bits  per  h  channel  symbols  • 
over  an  AWGN  channel  employing  2D  QAM  signals,  almost  all 
TCM  schemes  known  in  the  literature  assume  that  the  modu¬ 
lator  has  twice  more  signal  /i-tuples  than  strictly  needed.  The 
redundancy  of  the  trellis  code  is  then  1  bit  per  h  symbols  and 
it  has  been  shown  by  Ungerboeck  that  very  litle  increamental 
gain  can  be  achieved  by  further  increasing  the  modulator  al¬ 
phabet  redundancy.  It  seems  that  the  only  way  to  make  the 
trellis  code  more  powerful  is  to  increase  the  number  of  encoder 
states.  The  problems  are  the  maximum  likely-hood  decoding 
complexity  and  the  lack  of  algebraic  methods  to  synthesize 
good  codes  with  many  states:  the  only  practical  way  is  to 
generate  a  large  number  of  codes  in  a  small  class  where  one 
expects  to  find  the  best  codes,  followed  by  their  performance 
evaluation.  However,  an  exhaustive  search  for  good  codes  with 
many  states  still  remains  difficult  even  for  the  class  of  Geo¬ 
metrically  Uniform  (GU)  codes  [1]  whose  symmetry  proper¬ 
ties  and  algebraic  structure  permit  an  efficient  search  for  good 
codes. 

In  this  paper,  we  show  that  a  possible  way  to  solve  the 
problem  is  to  allow  the  code  to  have  more  redundancy  than 
one  bit  so  that  a  stage  construction  [2]  can  be  used  to  simplify 
the  search  for  good  codes  with  many  states  in  connection  with 
reduced  decoding  complexity,  using  stage  decoding. 

II.  Multilevel  Code  Construction 

A  2n+1-point  QAM  signal  constellation  S  is  partitioned  in 
two  steps  according  to  the  GU  partition  chain  Z2  jRT? /2RT? 
using  a  reflection  v  about  the  vertical  axis,  a  reflection  g  about 
the  origin,  and  a  translation  r  by  (0,2).  This  gives  a  8-way 
GU  partition  of  S  isomorphic  to  Z|.  For  a  positive  integer 
h  >  2,  the  encoder  structure  for  2h-D  GU  trellis  codes  is 
shown  in  Fig.  1.  As  a  multilevel  code,  one  bit  of  the  code 
Ci,  which  is  the  best  2"1  -state  rate-l//i  binary  convolutional 
codes,  identifies  a  coset  in  Z2/f?Z2  and  a  pair  of  bits  of  the 
code  C2,  which  is  the  2"2-state  rate-(2/t  —  l)/2h  binary  convo¬ 
lutional  code  taken  from  the  best  2h-D  trellis  codes  employing 
4-way  partition  of  the  QAM  constellation,  selects  a  coset  in 
RZ2 /2RZ2  each  time.  The  third  level  is  uncoded.  As  an  en¬ 
coder  for  an  MD  trellis  code,  the  scheme  generates  a  2/i-D  GU 
trellis  code  C  with  2‘'1+l'2  states  and  a  transmission  rate  of  n 
bit/sym.  For  h  =  2,3,4,  we  have  found  codes  with  good  per¬ 
formance/complexity  trade-off  with  stage  decoding.  Table  1 
shows  the  search  result  for  h  =  2.  The  encoder’s  generators 
are  given  in  octal.  Effective  coding  gain  is  in  decibels. 


III.  Performance  Analysis 

Figure  2  shows  cut-off  rates  of  two-step  partition  of  the 
64QAM  constellation.  The  code  design  region  for  transmission 
of  5  bit/sym  is  shadowed.  For  h  =  2,  3,4,  rates  of  component 
codes  are  shown  together  with  required  SNR  for  stage  decod¬ 
ing  to  perform  well  [3].  In  this  region,  the  rate  and  decoding 
complexity  are  traded-off  by  the  SNR. 
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Fig.  1:  The  encoder  structure 


Fig.  2:  Cut-off  rates  of  the  partition  of  64QAM.  Design  rates  for 
constituent  codes  ( h  —  2,3,4). 


Tab.  1:  4D  GU  trellis  codes  ( h  —  2). 
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Abstract  —  We  show  that  the  minimum  distance  d  of 
a  linear  code  is  not  approximable  to  within  any  con¬ 
stant  factor  in  random  polynomial  time  ( RP ),  unless 
NP  equals  RP.  In  the  process  we  show  that  it  is  hard 
to  find  the  nearest  codeword  even  if  the  number  of 
errors  exceeds  d/ 2  by  an  arbitrarily  small  fraction  ed. 

I.  Introduction 

Consider  a  linear  code  A[n,  k ,  d]q  with  generator  matrix  A  £ 
We  study  complexity  of  the  following  problems: 

•  Approximate  the  Minimum  Distance  d  of  a  linear  code  A\ 

•  Find  the  Nearest  Codeword  y  for  the  received  vector  x. 

Vardy  [5]  proved  that  it  is  NP-hard  to  compute  d  explicitly. 
The  (second)  Nearest  Codeword  Problem  (NCP)  was  proven 
to  be  NP-hard  in  [3].  More  generally,  we  can  consider  decoding 
complexity  given  relatively  low  error  weight.  For  real  p,  this 
gives  the  Relatively  Near  Codeword  Problem  RNC^: 

Given  a  generator  matrix  A  €  F^xn  .of  a  linear  code  A 
of  minimum  distance  d,  an  integer  t  with  the  promise  that 
t  <  p-d,  and  a  received  word  xtFJ,  find  a  codeword  within 
distance  t  from  x.  (The  algorithm  may  fail  if  the  promise  is 
violated,  or  if  no  such  codeword  exists.) 

In  particular,  p  =  1/2  in  the  “Bounded  distance  decoding 
problem”.  Till  recently,  not  much  was  known  about  RNC^ 
for  constants  p  <  oo,  let  alone  p  =  1/2.  Now  we  show  that 
RNC(p)  is  NP-hard  (under  random  reductions)  for  every  p  > 
1/2.  This  result  brings  us  closer  to  an  eventual  (negative?) 
resolution  of  the  bounded  distance  decoding  problem. 

We  also  show  that  the  minimum  distance  is  hard  to  approx¬ 
imate  within  any  constant  factor,  unless  NP  =  RP  (i.e.,  every 
problem  in  NP  has  a  polynomial  time  probabilistic  algorithm 
that  always  rejects  No  instances  and  accepts  Yes  instances 
with  high  probability).  In  our  work,  we  adapt  the  proofs  of 
results  for  integer  lattices  obtained  in  [2]  and  [4],  by  using 
linear  codes  that  surpass  random  codes. 

II.  Approximation  Problems 
A  promise  problem  is  a  generalization  of  decision  problem 
when  some  strings  axe  not  required  to  be  either  a  Yes  or  a 
No  instance.  However,  given  a  string  with  the  promise  that 
it  is  either  a  Yes  or  No  instance,  one  has  to  decide  which  of 
the  two  sets  it  belongs  to.  Below  we  use  A€F£*n,v£F^, 
and  t  eZ+.  Also,  q  is  a  prime  power,  7  >  1,  and  p  >  0. 

Definition  1  (Minimum  Distance  Problem) 

An  instance  of  GapDist7,,  is  a  pair  (A ,d),  such  that: 

(A ,d)  is  a  Yes  instance  if  d(A)  <  d; 

(A ,d)  is  a  No  instance  if  d(A)  >7  ■  d. 

1This  work  was  supported  by  the  NSF  grant  NCR-9703844. 

2This  work  was  supported  by  a  Sloan  Foundation  Fellowship, 
an  MIT-NEC  Research  Initiation  Grant  and  NSF  Career  Award 
CCR-9875511. 


Definition  2  (Nearest  Codeword  Problem) 

An  instance  of  GapNCP7,9  is  a  triple  (A,  v,t),  such  that: 

(A ,v,t)  is  a  Yes  instance  ifd(v,A)  <  t; 

(A ,v,t)  is  a  No  instance  i/d(v,A)  >7  t. 

Definition  3  (Relatively  Near  Codeword  Problem) 

An  instance  of  GapRNC7^  is  a  triple  (A,v,<),  such  that: 
t  <p-d(A); 

(A ,v,t)  is  a  Yes  instance  ifd(v,A)  <  t; 

(A ,v,t)  is  a  No  instance  ifd(v,A)  >  7 1. 

Our  reduction  uses  the  promise  problem  GapNCP7,9  that  is 
proved  to  be  NP-hard  [1]  for  every  constant  7  >  1.  It  is 
also  hard  [1]  to  approximate  d(v,A)  to  within  a  factor  of 
2log(1  0  n  for  any  e  >  0,  unless  NP  C  QP  (deterministic  quasi¬ 
polynomial  time). 

We  also  use  polynomial  reverse  unfaithful  random  reduc¬ 
tions  (RUR-reductions).  Given  a  security  parameter  s,  these 
probabilistic  algorithms  require  poly(s)  time  to  necessarily 
map  No  instances  to  No  instances  and  Yes  instances  to  Yes 
instances  with  high  probability  1  —  q~s. 

Theorem  4  For  any  p  >  1/2,  7  >  1  and  any  finite  field  F,  : 
GapRNC^  is  NP-hard  under  polynomial  RUR-reductions; 
GapDist7,7  is  NP-hard  under  polynomial  RUR-reductions; 
GapDist7i<)  is  NP-hard  under  quasi-polynomial  RUR- 
reductions  for  7 (n)  =  2log<  *  n . 

For  further  details,  see  [6]. 
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Abstract  —  An  updated  table  of  parameters  for  bi¬ 
nary  and  ternary  quadratic-residue  codes  of  length 
up  to  200  resp.  100  is  presented.  In  particular,  we 
find  that  the  minimum  distance  of  the  binary  [167,  83] 
quadratic-residue  code  is  24. 

I.  Preliminaries 

More  than  twenty  years  ago,  MacWilliams  and  Sloane  posed 
the  computation  of  the  minimum  distance  of  binary  and 
ternary  quadratic-residue  (QR)  codes  as  a  research  problem 
(Research  Problem  (16.1)  in  [1]).  Some  of  the  missing  min¬ 
imum  distances  were  presented  in  [2],  The  increase  of  com¬ 
puting  power  in  the  last  decades  made  it  possible  to  find  the 
minimum  distances  of  the  [137, 69]  binary  QR  code  [3]  and  the 
minimum  distance  of  the  [83,42]  ternary  QR  code  [4].  The 
problem  was  solved  by  the  built-in  algorithm  of  the  computer 
algebra  system  MAGMA  [5],  Using  some  theoretical  results,  we 
were  able  to  determine  the  minimum  distance  of  the  [167,  83] 
binary  QR  code  and  to  improve  some  of  the  bounds  on  the 
minimum  distance  presented  in  [1,  Fig.  16.1]. 

II.  Quadratic-Residue  Codes 

Let  p  and  l  be  prime  integers  such  that  l  is  a  quadratic  residue 
modulo  p.  Furthermore,  let  Q  denote  the  set  of  quadratic 
residues  modulo  p.  Then  the  polynomial  xp  —  1  can  be  factored 
over  GF(l)  as  xp  - 1  =  (x  -  l)g(x)n(:r),  where  the  roots  of  q(x) 
are  ar  for  all  non-zero  quadratic  residues  r  £  Q  and  a  is  a 
primitive  pth  root  of  unity  in  an  extension  field  of  GF{1).  The 
quadratic-residue  (QR)  codes  Q,  Q,  J7,  and  77  are  the  cyclic 
codes  of  length  p  over  GF(l)  generated  by  q(x),  ( x  —  l)q(x), 
n(x),  and  (x  —  l)n(x),  resp.  The  extended  quadratic-residue 
codes  Q  and  77  are  obtained  by  adding  an  overall  parity  check 
to  Q  and  A f,  resp.  A  lower  bound  on  the  minimum  distance 
dp  of  the  quadratic-residue  code  Qp  of  length  p  is  given  by 
dp  >  y/p-  The  minimum  distance  of  the  extended  code  Qp  and 
of  the  expurgated  code  Qp  is  dp  =  dp  +  l.  For  p  =  3  (mod  4), 
Q  is  self-dual.  Hence  l  must  be  2  or  3  (see  Theorem  1  in  [1, 
Ch.  19,  §1]).  Then,  for  l  —  2,  Q  is  doubly-even,  and  for  l  —  3, 
all  weights  are  multiples  of  3  (see  Theorem  8  in  [1,  Ch.  16, 

§4]x). 

III.  Results 

In  Table  1,  parameters  of  binary  and  ternary  extended  QR 
codes  of  length  up  to  200  resp.  100  are  presented.  Differences 
to  the  original  version  [1,  Fig.  16.2]  are  marked  and  references 
are  given.  Here  we  briefly  discuss  our  results: 

For  the  [167,84]  binary  code  Qm,  the  lower  bound  in  [1] 
is  dm  >  15,  the  upper  bound  is  efi67  <  23.  As  Qm  is  doubly 
even,  candidates  for  the  minimum  distance  are  16,  20,  and  24. 
To  show  that  the  true  minimum  is  24,  it  is  sufficient  to  show 

1Note  that  in  Theorem  8  (ii)  in  [1]  no  restriction  on  p  is  made, 
but  the  proof  uses  Theorem  7  which  requires  p  =  3  (mod  4). 


Tab.  1:  Parameters  of  extended  quadratic-residue  codes  Q  (updated 


version 

of  [1, 

Fig.  16.2, 

p.  43 

,3j). 

(a) 

Over 

GF(2) 

n 

k 

d 

n 

k 

d 

n 

k 

d 

8~ 

4 

4 

74 

“37 

14 

138 

69 

22c 

18 

9 

6 

80 

40 

16 

152 

76 

20 

24 

12 

8 

90 

45 

18 

168 

84 

24e 

32 

16 

8 

98 

49 

16 

t92 

96 

24e-28 

42 

21 

10 

104 

52 

20 

194 

97 

22f-28 

48 

24 

12 

114 

57 

16a 

200 

100 

248-32 

72 

36 

12 

128 

64 

20 

(b) 

Over 

GF(3) 

n 

k 

d 

n 

k 

d 

n 

k 

d 

~l2 

6 

6 

48 

24 

15 

~~W 

37 

18a 

14 

7 

6 

60 

30 

18 

84 

42 

20d 

24 

12 

9 

62 

31 

12a 

98 

49 

21h-24 

38 

19 

llb 

72 

36 

18a 

108 

54 

21‘-27 

New  entries:  “see  [2],  bsee  [2,  6],  csee  [3],  dsee  [4],  nd  >  21  and  doubly- 
even,  {d  >  22  and  even,  gd  >  22  and  doubiy-even.  hd  >  21,  'd  >  19  and 
d  =  0  (mod  3) 

dm  >  21  resp.  dm  >  20.  The  lower  bound  dm  >  20  was 
established  using  MAGMA  V2.5-1  in  about  8  days  on  a  SUN 
Ultra  5  running  at  360  MHz. 

For  the  [192,96]  binary  code  Qm,  we  have  enumerated  all 
approx.  240  vectors  of  information  weight  r  <  9.  The  lowest 
weight  encountered  was  28,  showing  dm  >  20  and  dm  >  21. 
Again,  Qm  is  doubly  even,  hence  dm  =  24  or  dm  =  28. 
The  lower  bounds  for  dm  and  d2oo  were  obtained  similarly. 

The  only  ternary  code  of  [1,  Fig  16.2]  whose  minimum 
distance  remains  unknown  is  Q97.  Enumeration  revealed 
21  <  dg g  <  24.  Additionally,  we  considered  the  [108,  54] 
ternary  code  <2m.  This  code  is  self-dual,  hence  dio7  =  0 
(mod  3).  Enumeration  showed  dm  >  19,  hence  dm  >  21. 
The  other  possible  values  are  24  and  27. 
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Abstract  —  We  show  that  for  a  code  used  for  error 
detection  or  combined  error  correction  and  detection 
in  the  binary  symmetric  channel,  the  probability  of 
an  undetected  error  can  have  several  local  maxima. 

In  particular,  we  construct  a  code  with  three  lo¬ 
cal  maxima  in  (0,1/2),  a  code  with  five  local  maxima 
in  (0, 1);  and  a  linear  code  with  two  local  maxima  in 
(0, 1/2)  and  a  linear  code  with  three  local  maxima  in 
(0,1). 

I.  Introduction 

Let  t  be  a  given  non-negative  integer  and  C  a  binary 
(n,M,d)  code  with  d  >  2t  +  1.  Denote  by  Br(x)  the 
Hamming  sphere  of  radius  r  centered  at  x  and  Bt  = 
l/M  ■  |{(a,b)  €  C  x  C  |  d(a, b)  =  z}|.  Assume  that  x 
and  y  are  given,  and  d(x, y)  =  i.  We  denote  by  P-t'1  (p) 
the  probability  that  x  changes  to  a  vector  in  Bt{ y)  in 
the  binary  symmetric  channel  with  transition  probability 
p.  The  probability  of  undetected  error  after  using  C  to 
correct  t  or  less  errors  is  given  by 

p£(c,p)  =  'jr,Bip(t)(p).  (i) 

i=l 

Klpve  and  Korzhik  [3]  give  an  excellent  account  of  error 
detecting  codes.  They  have  studied  a  large  number  of 
codes,  and  ask  [3,  p.  227]  whether  it  is  true  that  for  every 
code  (or  for  every  linear  code)  the  function  Pu){C,p)  has 
at  most  one  maximum  in  the  interval  (0, 1/2)  and  at  most 
two  maxima  in  [0,1].  We  answer  this  question  in  the 
negative. 

II.  The  approach 

When  we  are  only  interested  in  the  number  of  local 
maxima,  we  can  multiply  the  polynomial  (1)  by  the  con¬ 
stant  M/2,  and  instead  consider  the  polynomial 

Q{t\C,p)=YjqiP?)(p), 

i=  1 

where  now  <7,  tells  how  many  of  the  (^)  pairwise  distances 
between  the  codewords  are  equal  to  i.  Consequently, 

£•  =  (?)•  <2> 
i  N  7 

1  This  work  was  supported  by  the  Academy  of  Finland  under 
grant  #46186. 


Since  the  distance  between  two  codewords  is  even  if  and 
only  if  both  of  them  have  even  weight  or  both  of  them 
have  odd  weight,  we  know  that 

£  qi=K(M-K),  (3) 

i  odd 

where  K  denotes  the  number  of  codewords  of  odd  weight. 

Our  approach  consists  of  two  steps.  We  first  try  to  find 
a  polynomial  R(p)  —  £)"=1  riP^{p)  with  non-negative 
integer  coefficients  which  has  a  prescribed  number  of  local 
maxima  and  which  satisfies  (2)  and  (3)  for  some  M  and 
K.  On  the  other  hand,  we  have  the  goal  of  making  M  as 
small  as  possible. 

The  second  step  then  consists  of  constructing  a  code  C 
such  that  the  (^f)  pairwise  distances  between  the  code¬ 
words  have  the  required  distribution,  i.e.,  the  coefficients 
Qi  of  QW(C,p)  equal  the  r,’s.  Such  a  code  cannot  of 
course  exist  unless  the  required  distance  distribution  sat¬ 
isfies  the  Delsarte  inequalities. 

Using  this  method  we  are  able  to  construct  the  follow¬ 
ing  nonlinear  and  linear  codes. 

Theorem  1  Let  t  =  0,1.  There  exists  a  code  C  such 
that  the  function  P$(C,p)  has  three  local  maxima  in  the 
interval  (0, 1/2)  and  a  code  C'  such  that  the  function  has 
five  maxima  in  (0,1). 

Theorem  2  Let  t  =  0, 1.  There  exists  a  linear  code 
Ci  such  that  the  function  PffJ  (Ci,p)  has  two  maxima  in 
(0,1/2)  and  a  code  Cl  such  that  the  function  has  three 
maxima  in  (0, 1). 

Example  1  We  consider  now  the  case  t  =  2.  Let  us 
extend  twice  the  [31,5]  simplex  code.  We  furthermore 
take  the  vector  1111100. ..0  of  length  33  as  a  codeword  to 
obtain  a  (33,33)  code.  The  code  gives  <75  =  1,  <713  =  4, 
<715  =  16,  <716  =  496,  <717  =  8  and  <721  =  3  (other  <7,’ s 
equal  zero).  This  code  yields  two  maxima  for  P${C,p) 
in  (0, 1/2).  The  first  one  is  at  p  k  0.10  and  the  second  at 
p  a  0.48. 
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Abstract  —  We  use  projective  multisets  (projective 
systems)  to  find  upper  bounds  on  the  weight  hier¬ 
archies  for  a  special  class  of  codes,  namely  the  ex¬ 
tremal  non-chain  codes.  Several  code  constructions 
exist  meeting  the  bounds  with  equality. 

I.  Introduction 

Let  C  be  a  linear  q- ary  code  of  dimension  fc  and  length  n. 
The  weight  w(S)  of  a  subcode  S  C  C,  is  the  number  of  posi¬ 
tions  where  at  least  one  word  in  S  differs  from  zero.  The  rth 
generalised  Hamming  weight  dr  of  C  is  the  least  weight  of  an 
r-dimensional  subcode  of  C.  The  sequence  (d\ ,82,  ■  -  ■  ,dk)  is 
called  the  weight  hierarchy  of  C  [6]. 

II.  Extremal  Non-Chain  Codes 

The  chain  condition  was  introduced  in  [7],  and  states  that 
there  is  a  chain  Do  C  ■  ■  ■  C  Dk  of  subcodes,  where  Di  has 
dimension  i  and  weight  di. 

The  opposite  extreme  are  the  extremal  non-chain  codes, 
defined  as  follows.  For  each  pair  (i,j)  where  1  <  i  <  j  <  k, 
there  are  no  subcodes  Di  C  Dj  of  dimensions  i  and  j  respec¬ 
tively  such  that  w(Di)  =  di  and  w(Dj)  =  dj.  The  extremal 
non-chain  codes  were  introduced  by  Chen  and  Klpve  [1],  and 
this  study  continues  their  work. 

III.  Projective  Multisets 

Let  G  be  a  k  x  n  generator  matrix  of  C.  By  permuting 
columns  of  G  or  by  multiplying  certain  columns  by  non-zero 
scalars,  we  get  an  equivalent  code.  Equivalent  codes  have  the 
same  weight  hierarchy. 

Let  PG(fc—  1,  q)  be  the  projective  (fc— l)-space  over  the  finite 
field  with  q  elements.  The  code  C  is  determined  up  to  equiv¬ 
alence  by  giving  the  map  7  :  PG(fc  —  1,  q)  — >  (0, 1, . . .},  saying 
how  many  times  each  projective  point  occurs  as  a  column  in 
G.  Such  a  map  is  called  a  projective  multiset  [2],  a  projective 
system  [5],  or  a  value  assignment  [1,  4].  The  definition  of  7  is 
extended  by  7 (S)  =  J2xes‘T(x)  for  all  S  C  PG (fc  —  1,  <7).  The 
number  7 (S)  is  called  the  value  of  S. 

We  know  [3,  5]  that  a  subcode  Dr  of  dimension  r  and  weight 
w,  corresponds  to  a  subspace  Sr  C  PG(fc  —  1,  q)  of  dimension 
k  —  r  —  1  and  value  7 (SV)  =  n  —  w.  Hence  a  subcode  Dr 
of  minimum  value  corresponds  to  a  projective  subspace  Sr  of 
maximum  value.  Also  if  Dr  C  Dr>,  then  SV  2  Sr>. 

The  difference  sequence  (So,  Si, . . . ,  Sk-i)  is  defined  by  8,  = 
dk- 1  —  dk-i-i-  The  difference  sequence  is  easily  computed 
from  the  weight  hierarchy  and  vice  versa.  If  S  is  an  i-space 
of  maximum  value,  then  7 (S)  =  So  +  Si  +  . . .  +  Si.  A  differ¬ 
ence  sequence  corresponding  to  an  extremal  non-chain  code  is 
called  an  ENDS  (Extremal  Nonchain  Difference  Sequence). 


IV.  Results 

Theorem  1  (General  Bound)  If  (So,  81, . . . ,  Sk-i)  is  an 
ENDS,  1  <  m  <  k  —  2,  then 


If  equality  holds  for  m  =  m,  then  equality  holds  for  all  m  <  m. 

Theorem  2  (Binary  Codes)  If  (Sq,S\,  . . .  ,Sk-i),  fc  >  4  is 
a  binary  ENDS,  then 

5k-2<2k~3Si-2-2k-3. 

Theorem  3  (Total  Value)  If  (8o,5i, . . .  ,5k-i),  fc  >  3  is  an 
ENDS,  then 

m_1  nk~m  —  1 

7(PG(fc-l,g))<  £  *  +  1)! q  , 

i=0  ” 

for  all  m,  such  that  1  <  m  <  k  —  2. 

Explicit  constructions  meeting  the  bounds  with  equality 
exist  in  dimension  5  and  less,  provided  5o  is  sufficiently  large; 
<5o  >  5  is  sufficient  in  all  cases. 
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Abstract  -  Nonquasicatastrophic  maximum  transition  run 
(MTR)  codes  are  introduced  by  defining  a  new  /-constraint. 
Finite  state  transition  diagrams  (FSTD)  exhaustively 
characterizing  MTR  ( j,k,t )  constraints  for  detector  trellises  that 
are  unconstrained  or  incorporate  the  y'-constraint  are  presented 
and  their  capacity  is  computed.  It  is  shown  that  ( G,1 ) 
constrained  systems  are  a  subclass  of  ij,k,t)  MTR  constrained 
systems. 

I.  Introduction 

Consider  a  recording  channel  consisting  of  a  modulation  encoder, 
precoder,  generalized  partial  response  channel,  Viterbi  detector, 
inverse  precoder  and  modulation  decoder.  Let  {b,}  e  B  denote  the 
input  of  the  modulation  encoder,  where  bi  6  {0, 1}  and  B  is  the  set 
of  all  binary  sequences.  The  modulation  encoder  generates  binary 
sequences  {*,}  e  X  that  satisfy  a  desired  constraint,  such  as  a  (G,/) 
constraint  or  a  maximum  transition  run  constraint.  The  precoder  is 
usually  of  the  form  1/(1  ©D)  or  1/(1  ®D2)  and  its  output  is 
denoted  by{y,}  e  Y,  where  Y  denotes  the  set  of  all  possible  channel 
input  sequences. 

The  class  of  generalized  partial -response  channel  polynomials  of 
the  form  F(D)  =  (1  -D2)(  1  -P(D))  is  studied,  where  the  whitening 
filter  1  -P(D)  has  no  roots  on  the  unit  circle.  The  Viterbi  detector 
.provides  an  estimate  of  the  channel  input  sequence  {y,}  e  Y,  where 
Y  is  usually  a  proper  subset  of  Y.  The  output  sequences  of  the 
inverse  precoder  are  denoted  by  {jc,  }  e  X. 

II.  Characterization  of  MTR  Constraints 

We  define  the  maximum  run  of  accumulated  zero-distance  as  the 
maximum  number  of  branches  associated  with  two  distinct  trellis 
paths  that  have  the  same  output  labels,  i.e., 

r a  max  {«  :  Zt?  =  0},  (1) 

where  {e,}  is  the  channel-output  error  sequence  with  Z>-transform 
e(D)  =  (y(D)-y(D))F(D).  Clearly,  the  sequence  detector  suffers 
from  quasicatastrophic  error  propagation  [1]  if  r  =  to. 

It  can  be  verified  that  traditional  MTR(/,/:)  codes  [2]  do  not  avoid 
quasicatastrophic  error  propagation  in  sequence  detectors  for 
generalized  partial-response  channels  that  have  spectral  nulls  both 
at  dc  and  the  Nyquist  frequency,  i.e.,  r  =  to.  A  new  constraint  is 
thus  necessary  to  ensure  that  MTR  codes  are  nonquasicatastrophic. 
We  say  that  the  output  sequence  {*,}  of  an  encoder  satisfies  a 
“twins”  constraint,  or  /-constraint,  if  it  does  not  allow  r+1 
consecutive  pairs  of  0’s  or  1  ’s  (“twins”)  that  are  the  complement  of 
an  allowable  string  x,,i,+i, >21+1  at  the  inverse  precoder  output. 
The  /-constraint  can  be  characterized  by  a  finite  set  of  forbidden 
strings  and  is  therefore  a  shift  of  finite  type.  A  special  case  of  the 
twins  constraint  was  introduced  in  [3].  Traditional  MTR  codes  that 
also  satisfy  a  twins  constraint  are  referred  to  as  MTR (/',/:,/)  codes.  It 
can  be  shown  that  for  the  new  class  of  MTR  (/,/:,/)  codes  it  holds 
that  r-  max(/  +  l,k  +  1,2/+  3)  -L.  Hence,  MTR (/,/:,/)  codes  are 
nonquasicatastrophic  if  and  only  if  j,  k,  and  /  are  finite. 

Let  PC(G,I)  be  the  set  of  all  allowable  sequences  at  the  output  of  a 
1/(1  @  D1)  precoder  following  a  (G,/)  modulation  encoder. 


Similarly,  let  M(j,k,t)  and  Mc(j,k,t)  be  the  sets  of  all  allowable 
sequences  at  the  1/(1  ®  D)  precoder  input  and  output,  respectively, 
where  the  MTR  constrained  system  corresponds  to  the  case  of  an 
unconstrained  detector  trellis,  i.e.,  X=Y=B. 

Proposition  1  PC(G,I)  -  Mc(j-G  +  l,k  =  G+l,t  =  F) 

The  above  proposition  states  that  (G,/)  constraints  are  a  subclass  of 
the  (j,k,t)  MTR  constraints. 

For  channels  with  memory  L  >j  +  1  the  ./'-constraint  can  readily  be 
incorporated  into  the  detector  trellis  to  reduce  the  number  of  states 
and/or  branches,  and  to  increase  the  capacity  of  the  constrained 
system  M(j,k,t)  by  adding  new  potential  code  sequences  that  were 
not  allowed  before.  The  new  expanded  constrained  system  is 
denoted  by  M'(j,k,t )  .where  the  prime  indicates  that  the  generalized 
partial-response  detector  trellis  is  ./-constrained.  For  j= 2  and  j=  3, 
FSTDs  have  been  constructed  by  tracking  the  run  length  of  all  four 
phases  of  the  patterns  (0011)  arriving  at  a  state.  Tables  1  and  2  list 
the  capacity  of  MTR  constraints  M'(j,k,t)  for  j= 2  and  3  by 
truncating  the  numbers  after  the  fourth  digit  following  the  decimal 
point. 
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TABLE  1.  Capacity  of  MTR  constraints  M'(j  =  2,  k,t) 


k 

/ 

2 

3 

4 

5 

6 

7 

8 

9 

1 

0.5514 

0.6370 

0.6819 

0.7057 

0.7189 

0.7263 

0.7305 

0.7330 

2 

0.6508 

0.7472 

0.7888 

0.8090 

0.8193 

0.8248 

0.8278 

0.8294 

3 

0.6792 

0.7819 

0.8264 

0.8475 

0.8581 

0.8636 

0.8664 

0.8680 

4 

0.6887 

0.7900 

0.8334 

0.8538 

0.8641 

0.8694 

0.8722 

0.8737 

5 

0.6922 

0.7933 

0.8365 

0.8569 

0.8671 

0.8724 

0.8751 

0.8766 

6 

0.6934 

0.7941 

0.8372 

0.8575 

0.8671 

0.8728 

0.8756 

0.8771 

7 

0.6939 

0.7945 

0.8375 

0.8578 

0.8679 

0.8731 

0.8759 

0.8773 

8 

0.6941 

0.7946 

0.8375 

0.8578 

0.8679 

0.8732 

0.8759 

0.8774 

TABLE  2.  Capacity  of  MTR  constraints  M'(j  =  3,k,t) 


k 

> 

2 

3 

4 

5 

6 

7 

8 

9 

1 

0.6370 

0.6942 

0.7266 

0.7444 

0.7544 

0.7599 

0.7631 

0.7650 

2 

0.7472 

0.8345 

0.8707 

0.8876 

0.8960 

0.9002 

0.9024 

0.9036 

3 

0.7819 

0.8670 

0.9034 

0.9203 

0.9285 

0.9326 

0.9346 

0.9357 

4 

0.7900 

0.8756 

0.9115 

0.9280 

0.9359 

0.9399 

0.9419 

0.9429 

5 

0.7933 

0.8781 

0.9137 

0.9301 

0.9380 

0.9419 

0.9439 

0.9450 

6 

0.7941 

0.8788 

0.9143 

0.9306 

0.9385 

0.9425 

0.9444 

0.9455 

7 

0.7945 

0.8790 

0.9145 

0.9308 

0.9387 

0.9426 

0.9446 

0.9456 

8 

0.7946 

0.8791 

0.9145 

0.9308 

0.9387 

0.9426 

0.9446 

0.9456 
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Abstract  —  Recent  success  of  turbo-like  coding 
schemes  on  memoryless  channels  has  sparked  interest 
in  using  them  on  intersymbol-interference  (ISI)  chan¬ 
nels.  Decoders  for  turbo  and  low  density  parity  check 
(LDPC)  codes  perform  much  better  with  soft  input 
information  which  has  to  be  supplied  by  the  channel 
detector  as  its  soft  output.  We  consider  a  class  of  ISI 
channels  commonly  used  to  model  magnetic  recording 
channels  in  a  wide  range  of  linear  recording  densities, 
and  show  that  simple  soft  output  detectors  are  pos¬ 
sible,  since  the  channel  transfer  functions  belong  to  a 
family  of  special  polynomials. 

I.  Introduction 

Let  {xn},  xn  G  GF(2),  be  the  possibly  coded  user  data  se¬ 
quence.  We  consider  a  discrete-time  model  for  the  magnetic 
recording  channel  with  input  {an},  on  =  2xn  —  1,  impulse 
response  {hn},  and  output  {yn}  given  by 

y-n  — -  ^  ^  an  —  mhm  T  7J n,  (1) 

m 

where  r)n  are  independent,  zero-mean,  Gaussian  random  vari¬ 
ables.  We  separately  consider  the  PR4  channel  with  the  trans¬ 
fer  function  h(D)  =  ^2nhnDn  =  1  —  D2 ,  and  higher  order 
partial  response  (PR)  channels  with  h(D)  =  (1  —  D)(l  +  D)N , 
N  >  2. 

The  optimal  receiver  for  magnetic  recording  channel  model 
performs  maximum  likelihood  sequence  estimation  (MLSE) 
i.e.,  it  determines  an  {&„}  satisfying 

min  Q({an})  =  fl({dn}), 

{an}ec 

where  0({an  })  is  the  well  known  log-likelihood  function  for 
channels  with  inter-symbol  interference: 

«({<*»})  -  -  £  Q'mhn  —  m  )'•  (2) 

A  general  soft-output  sequence  estimation  was  introduced  in 

[1],  and  it  is  of  course  possible  to  get  information  on  symbol 
reliabilities  by  using  techniques  presented  there.  However,  the 
transfer  functions  of  magnetic  recording  channels  belong  to  a 
family  of  special  polynomials.  We  exploit  that  fact  to  derive 
simple  soft  output  detectors  for  these  channels.  We  propose 
two  types  of  soft  output  channel  detectors:  one  based  on  a 
sequence  detector  for  the  PR4  channel,  the  other  based  on  a 
symbol-by-symbol  detector  enabled  by  special  precoding  for 
higher  order  PR  channels. 

II.  The  PR4  Channel 

The  maximum  likelihood  sequence  detector  for  the  1  —  D2 
channel  is  realized  by  two  interleaved  Viterbi  detectors  cor¬ 
responding  to  the  two  constituent  1  —  D  channels  whose  log- 
likelihood  function  is  given  by 

})  —  ^  '  \yk  ( Ufc— i  )]  • 
k 


Common  implementations  of  the  MLSE  use  the  recursive 
difference  metric  algorithm  of  [2].  It  was  recognized  in  [2]  that 
the  decision  about  extensions  to  both  states  at  time  n  can  be 
made  based  on  single  variable 

<5 n  =  AJn-1  -  Vn,  where  A Jn  =  [Jn(l)  -  Jn(~  1)3/2 

and  Jn(s )  is  the  minimum  cost  up  to  time  n  and  state  s, 
sG  {-1,1}: 

n 

Jn{s)  =  min  V'  [(ak  -  ak-i)yk  +  akdk-i]. 

{a„}ec  ' 

=  — oo,on=s 

It  can  be  easily  shown  that  the  difference  in  cost  of  the  sur¬ 
viving  and  discarded  extension  to  state  s  G  {—1,1}  at  time  n 
is  equal  to  |<5„  +  s|.  Therefore,  once  the  most  likely  symbol  at 
time  n,  a„,  is  known,  the  soft  information  about  an- i  (its  re¬ 
liability)  can  be  computed  as  |5n  +an|.  Under  the  assumption 
that  either  the  most  likely  path  or  the  second  best  path  is  the 
correct  path  and  the  assumption  that  only  minimum  distance 
error  events  are  possible,  the  two  most  likely  paths  can  differ 
in  a  string  of  consecutive  Is  or  a  string  of  consecutive  —Is. 
Therefore  the  possible  error  event  may  have  originated  at  any 
time  k  <  n  —  1  such  that  dk  =  dk+i  =  ■■  ■  —  d„-i.  All  these 
symbols  are  assigned  the  reliability  of  a„-i. 

III.  Higher  Order  PR  Channels 

Let  {ien},  wn  G  GF(2),  be  a  sequence  obtained  from  { xn }  by 
special  processing  known  as  precoding ,  and  {s„}  the  channel 
sequence  in  (1)  obtained  from  {wn}  as  an  =  2w„  —  1.  For 
channel  with  the  transfer  function  h(D)  =  (1  —  D)(  1  +  D)N , 
we  choose  the  precoder  transfer  function  to  be  1/(1  ©  D)N+1, 
as  proposed  in  [3].  This  gives 

H'<D>  =  am^x(D)’ 

where  ©  denotes  the  addition  in  GF( 2).  Such  precoding  gives 
the  following  relation  between  the  user  data  {a:n}  and  the 
channel  noiseless  output  rn  =  an-mhm: 

Xn  =  I  IT  I  mo(* 2’  (3) 

which  makes  symbol-by-symbol  channel  detection  possible. 
The  soft-output  channel  detection  we  propose  relays  on  (3) 
and  some  other  features  of  these  special  ISI  channel. 
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Abstract  —  We  develop  union  bounds  for  high  rate 
linear  codes  used  for  partial  response  equalized  chan¬ 
nels  with  additive  white  Gaussian  noise.  One  particu¬ 
lar  application  of  the  present  setting  is  the  computa¬ 
tion  of  bounds  for  magnetic  recording  systems  using 
turbo  codes. 

I.  Summary 

A  recent  application  of  turbo  codes  is  in  digital  magnetic 
recording  [1].  So  far,  all  of  the  studies  on  the  subject,  with 
the  exception  of  one  [2],  use  Monte  Carlo  simulation  using  a 
sub-optimal  decoding  algorithm  to  evaluate  the  performance 
of  system.  There  clearly  is  a  need  to  analyze  the  system  per¬ 
formance  from  a  theoretical  perspective.  In  [2],  the  authors 
develop  performance  bounds  for  the  turbo  equalized  dicode 
(1  —  D)  channel  assuming  maximum  likelihood  decoding  by 
using  the  union  bounding  technique.  However,  their  result 
cannot  be  used  to  predict  the  performance  of  a  general  partial 
response  (PR)  equalized  magnetic  recording  channel,  such  as 
PR4  or  EPR4. 

In  this  paper,  we  develop  the  union  bound  for  an  arbitrary 
partial  response  equalized  channel  when  maximum  likelihood 
decoding  is  employed.  The  resulting  bound  is  a  generalization 
of  the  results  of  [2],  however,  our  approach  is  totally  different. 


WG  noise 


Figure  1:  System  block  diagram. 

The  block  diagram  of  the  system  is  presented  in  Figure  1. 
Consider  the  transmission  of  a  block  of  Nu  information  bits. 
The  information  bits  are  first  encoded  by  a  high  rate  linear 
code  to  obtain  a  coded  sequence.  The  coded  sequence  is  then 
interleaved,  and  then  may  or  may  not  be  precoded.  The  (pre- 
coded)  bit  sequence  is  then  modulated  (“l”s  are  mapped  to 
“+1”  and  “0”s  are  mapped  to  l”s)  to  obtain  the  channel 
input.  The  channel  is  a  partial  response  channel,  which  can 
be  described  by  a  certain  trellis. 

In  order  to  make  the  derivation  of  the  bounds  tractable, 
we  assume  that  the  interleaver  is  uniform.  Furthermore,  we 
assume  that  for  any  error  event  e,  the  squared  Euclidean  dis¬ 
tance  between  two  codewords,  6i  and  62  with  61  0  62  =  e,  is 
approximately  equal  to  the  squared  Euclidean  distance  pro¬ 
duced  when  these  two  codewords  are  not  restricted  to  lie 
within  the  code.  This  approximation  is  valid  for  high-rate 
linear  codes  only,  and  it  is  the  same  approximation  used  in  [2] 
to  find  performance  bounds  for  the  dicode  channel. 

We  assume  that  ^  is  the  two  sided  power  spectral  density 
of  the  noise,  and  we  define  the  signal  to  noise  ratio  per  infor¬ 
mation  bit  as  SNR  =  57  jfjj,  where  Rc  is  the  underlying  code 
rate,  and  E  is  the  energy  of  the  PR  channel.  Let  us  denote  the 


number  of  codewords  of  the  underlying  code  with  information 
weight  i  and  total  weight  d  by  c(i,  d).  We  show  that 


Pb  <  2 


Nu  N 

-"LEE 

»=1  d—0  d\, 


±-c(i,d)-^t{d,d\)Q 


where  Pb  is  the  bit  error  probability,  and  t{d,  d\)  is  the  number 
of  different  pairs  of  sequences  with  a  Hamming  distance  d  and 
a  squared  Euclidean  distance  d%.  This  quantity  can  be  com¬ 
puted  using  an  extended  state  diagram  of  the  partial  response 
target  which  lists  the  possible  squared  Euclidean  distances 
and  the  number  of  information  bit  differences  between  any 
two  pairs  of  sequences,  that  is,  any  two  uncoded  sequences. 


Figure  2:  Bound  and  simulation  results  for  the  example. 


In  Figure  2  we  present  the  union  bound  for  the  rate  16/17 
(5,  7)  (in  octal  notation)  convolutional  code  with  an  interleaver 
length  of  N  =  2048  for  EPR4  (1  +  D  -  D2  -  D3)  channel. 


II.  Conclusions 

We  developed  the  union  bound  for  a  high  rate  linear  code 
used  over  a  partial  response  equalized  additive  white  Gaus¬ 
sian  noise  channel.  The  bound  is  applicable  for  a  general 
partial  response  equalized  channel  with  or  without  precoding. 
Therefore,  we  now  have  a  way  of  predicting  the  performance 
of  a  coded  magnetic  recording  system,  or  any  other  partial  re¬ 
sponse  equalized  system,  with  maximum  likelihood  decoding. 
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Abstract  —  This  paper  presents  a  concatenated  cod¬ 
ing  scheme  for  error  correction  in  (0,  k)  constraint 
channels  with  hard  and  soft  decision  decoding. 

I.  Introduction 

In  magnetic  recording  systems,  constraints  on  the  num¬ 
ber  of  consecutive  like  symbols  sent  on  the  channel  are  im¬ 
posed  in  order  to  maintain  the  clock  synchronization  at  the 
receiver.  The  use  of  runlength  limited  (RLL)  sequences  have 
found  almost  universal  application  in  optical  and  magnetic 
disk  recording  practice  [1].  Ref.  [2]  has  shown  a  method  to 
find  combined  codes  by  modifying  a  known  error  control  code 
into  a  runlength  limited  code.  The  best  codes  obtained  are 
of  short  length  because  the  lower  bounds  obtained  for  the  k 
constraint  are  prohibitively  large  for  long' codes.  This  paper 
presents  a  concatenated  coding  scheme  for  error  correction 
in  channels  with  (0,  k)  constraint  based  on  a  Reed-Solomon 
code  as  an  outer  code  and  a  runlength  limited  code  obtained 
by  modifying  a  binary  linear  transparent  block  code  [2]  as  an 
inner  code. 

II.  Code  Construction 

The  inner  code  Ci  is  a  (ni, ki)  modified  version  of  a  lin¬ 
ear  transparent  binary  block  code  [2].  The  outer  code  C2 
is  a  non-binary  Reed-Solomon  code  with  symbols  of  ki  bits. 
The  encoding  process  is  done  in  three  steps.  Firstly,  infor¬ 
mation  symbols  are  encoded  by  a  conventional  Reed-Solomon 
encoder  to  form  an  712  coding  vector.  In  the  next  step,  each 
fci-binary  sequence  is  encoded  into  a  code  vector  by  Ci.  Fi¬ 
nally,  a  modification  vector  is  added  to  each  Ci  codeword  to 
obtain  a  string  of  712  runlength  limited  code  vectors  of  Ci. 
Thus,  the  resulting  code  is  a  (711712,^1^2)  binary  code.  The 
decoding  process  may  be  performed  either  by  hard  or  soft  de¬ 
cision.  Hard  decision  decoding  is  performed  firstly  removing 
the  modification  vector  from  each  modified  Ci  codeword  as 
it  arrives  at  the  receiver.  Then,  a  conventional  decoder  for 
the  Ci  parent  code  is  used  to  decode  the  ni  codewords,  pro¬ 
ducing  sequences  of  ki  bits.  Sequences  of  712  symbols  are  then 
decoded  by  a  conventional  Reed-Solomon  decoder  to  obtain  an 
estimate  of  the  original  message.  Soft  decision  decoding  may 
be  performed  by  using  a  minimal  trellis  representation  of  the 
inner  code.  The  branch  labelling  of  the  trellis  must  be  mod¬ 
ified  according  to  the  corresponding  modification  vector  [3] . 
Then,  the  Viterbi  algorithm  may  be  used  to  decode  the  inner 
code.  The  RLL  code  is  obtained  from  the  method  presented 
in  ref.  [2].  The  runlength  constraint  of  this  code  is  reached  by 
modifying  the  systematic  generator  matrix  of  a  binary  linear 
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transparent  block  error  control  code  and  then  adding  a  suit¬ 
able  coset  leader  that  provide  the  best  performance  in  terms 
of  runlength.  The  modification  is  made  by  means  of  column 
permutations  of  the  generator  matrix  of  the  parent  code  C\ 
to  obtain  a  lower  bound  for  the  k  constraint.  Because  of  the 
linearity  of  the  original  code,  the  Hamming  distance  and  the 
correction  capacity  of  the  code  are  preserved. 

III.  Results 

The  proposed  scheme  is  best  explained  by  examples,  which 
are  going  to  be  shown  in  the  presentation.  Soft  decision  de¬ 
coding  of  the  concatenated  codes  presents  a  coding  gain  of 
about  3dB  over  hard  decision  decoding. 

IV.  Conclusion 

This  paper  presented  a  construction  of  a  concatenated  cod¬ 
ing  scheme  for  error  correction  in  channels  with  (0,  k )  con¬ 
straint.  The  parent  binary  codes  and  the  bounds  for  the  k  con¬ 
straint  may  be  selected  from  ref.  [2],  Because  of  transparency, 
all  the  code  vectors  (and  any  concatenation  of  them)  of  the 
runlength  limited-inner  code  of  the  scheme  satisfy  the  k  con¬ 
straint.  Hence,  both  the  number  of  "zeros"  and  the  number 
of  "ones"  of  any  encoded  sequence  are  bounded  by  k  without 
loss  in  coding  rate.  The  proposed  scheme  allows  to  construct 
long  error  control  codes  with  the  same  runlength  constraints 
of  a  small  runlength  limited  code.  The  error  correction  capac¬ 
ity  of  the  inner  code  may  be  utilized  for  correcting  random 
errors.  Burst  of  errors  affecting  symbols  can  be  corrected  by 
the  outer  code.  Hence,  the  codes  are  effective  against  a  mix¬ 
ture  of  random  and  burst  errors.  An  increase  in  the  length 
of  the  correctable  burst  can  be  obtained  by  interleaving  the 
symbols  of  the  outer  Reed-Solomon  code. 
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Abstract  —  The  class  of  punctured  convolutional 
codes  were  first  constructed  by  starting  with  low 
rate  convolutional  codes,  and  by  periodically  punc¬ 
turing  single  bits  out  of  some  code  symbols  in  a  time 
varying  trellis  diagram.  Thus,  simplified  Viterbi  de¬ 
coders  could  decode  the  resulting  codes,  with  only 
two  branches  entering  each  state  in  the  trellis  dia¬ 
gram  [1].  This  concept  was  ingeniously  extended  in 

[2],  to  construct  incrementally  variable  rate  codes  for 
unequal  error  protection.  Here  we  somewhat  reverse 
the  above  procedure,  and  name  the  resulting  codes 
’’pruned  codes”.  We  now  start  with  optimal  high  rate 
convolutional  codes,  and  periodically  delete  complete 
code  symbols  and  branches  to  obtain  a  time  vary¬ 
ing  trellis  diagram.  Hence,  lower  rate  codes  capable 
also  of  correcting  insertions  and  deletions  can  be  con¬ 
structed. 

I.  Pruned  Convolutional  Codes 

The  sequences  of  high  rate  convolutional  codes  offer  many 
degrees  of  freedom  for  pruning.  We  show  that,  by  judiciously 
pruning  these  codes,  lower  rate  codes  can  be  obtained,  capable 
of  correcting  insertions/deletions,  and  also  with  an  increased 
free  distance  at  the  corresponding  stages  in  the  trellis  diagram, 
thus  making  possible  unequal  error  protection. 

It  should  be  noted  that  the  pruned  codes  are  subcodes  of 
known  good  convolutional  codes,  hence  complicating  issues 
such  as  catastrophic  error  propagation  are  avoided.  In  some 
of  our  code  constructions,  the  original  punctured  code  imple¬ 
mentation  advantage  of  a  simplified  Viterbi  decoder  with  only 
two  branches  remerging  in  each  state  can  be  retained.  This 
is  possible  if  we  start  with  a  high  rate  base  code,  which  is  a 
punctured  convolutional  code. 

II. Example 

Our  code  construction  procedure  can  perhaps  be  best  ex¬ 
plained  by  an  example. 

The  time  varying  trellis  diagram  of  a  R  —  4  )  dmin  —  3  (i.e. 
reversal  error  correction,  t  —  1)  punctured  convolutional  code, 
with  octal  generators  5,7,  5,  7  from  [1],  can  be  depicted  with 
a  R  =  ^,h  —  1,2  trellis  diagram. 

The  R  =  |  code  is  selectively  pruned  to  obtain  a  rate  of 
R=  This  code  now  has  dm,n  =  8,  t  =  3,  and  the  remaining 
code  symbols  represent  a  single  insertion/ deletion  correcting 
code  (i.e.  s  =  1)  since  each  n  =  4  bit  symbol  complies  with 
the  condition  in  [3]: 


ix ,  -  a(modm),i  =  1 ...  4.  (1) 

for  some  fixed  integers  a  and  m,  where  m  >  n  +  1.  Here 
a  =  0  and  m  =  5. 

In  general,  before  affecting  single  insertion/deletion  correc¬ 
tion  with  codes  complying  with  (1),  the  boundaries  of  the  code 
word  need  to  be  known.  This  can  be  affected  with  marker  se¬ 
quences,  or  more  productively  with  the  remaining  code  sym¬ 
bols  forming  marker  code  books  [4],  which  enable  the  simul¬ 
taneous  transmission  of  data. 

An  alternative  pruning  of  the  R  =  |  code  can  then  be 
done.  The  resulting  R~\  code  also  has  dmin  —  8,  and  t  =  3. 
Each  pair  of  code  symbols,  exiting  a  state,  now  form  a  marker 
code  book  from  [4],  This  alleviates  the  boundary  problem. 
Furthermore,  each  pair  of  n  =  4  bit  symbols  also  form  an 
s  =  1  insertion/deletion  correcting  code,  due  to  the  repetition 
of  3  bits  within  the  symbol. 

These  flexible  codes  can  be  used  as  building  blocks,  which 
can  be  concatenated  in  many  different  ways,  to  protect  sen¬ 
sitive  data  files,  such  as  multimedia,  with  unequal  error  pro¬ 
tection  against  reversal  (i.e.  additive)  errors,  or  against  inser¬ 
tion/deletions  during  synchronization  failures. 

It  should  be  noted  that  there  is  a  trade  off.  Although  the 
lower  rate  codes  may  have  a  suboptimum  dm ;n ,  the  advantage 
of  correcting  insertions/deletions  is  obtained. 
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Abstract  —  Convolutional  codes  with  unequal  in¬ 
formation  protection  are  investigated.  Lower  bounds 
on  the  free  distance  of  time- varying  codes  are  derived 
and  compared  to  previous  bounds.  The  asymptotic 
behavior  of  these  bounds  leads  to  the  conclusion  that 
significant  gains  for  the  important  data  are  attain¬ 
able  by  enlarging  the  corresponding  constraint  length. 
This  comes  at  the  cost  of  reduced  performance  for  the 
less  significant  data. 

We  consider  codes  with  two  importance  levels  wherein 
each  message  block  is  divided  into  two  significance  levels  : 
m  =  (mi,  m2)  ,  m;  e  Mi  =  ( GF(2))ki  [1].  Consequently, 
the  data  portion  at  the  encoder  input  corresponding  to  mi  is 
represented  by  a  binary  k\ -tuple,  that  corresponding  to  m2  is 
represented  by  a  binary  fc2-tuple,  and  the  concatenated  binary 
k  =  ki  +  k2  vector  comprises  the  encoder  input  in  result  to 
m  =  (mi, m2).  Suppose  that  a  block  code  is  used  with  the 
encoding  function  c  :  m  — 1  c(m)  then  the  separation  of  the 
code  -  s  =  (si,  S2)  -  is  defined  as 

Si  =  min  d(c(m),c(m'))  ,  t  =  l,2, 

(m,m' )  : 

where  d(-,  •)  is  any  metric  defined  on  the  set  {c(m)}. 

Herein  as  we  deal  with  convolutional  codes  the  separation 
definition  extends  to  free  distances  or  active  row  distances 
[2]  evaluated  on  output  sequences  generated  while  the  input 
sequences  are  constrained  to  m,-  m' . 

For  the  class  of  time  varying  convolutional  codes  with  pe¬ 
riod  T  we  seek  to  answer  the  following.  Given  that  the  en¬ 
coder  input  is  a  binary  A:-tuple,  the  code  complexity  is  fixed 
at  kv  =  k\V\  +  k2v2  and  the  branch  length  equals  N,  what  is 
the  set  of  attainable  separation  vectors. 

Let  ut,t  e  {—00,  +00},  denote  the  encoder  input  fc-tuple 
at  time  t  and  let  U[t-„,t+j+v]  denote  the  set  of  information 
sequences  ut_„ut_„+i  . . .  u t+j+v  such  that  the  first  v  and  last 
v  subblocks  (fc-tuples)  are  zero  and  such  that  they  do  not  con¬ 
tain  v  +  1  consecutive  zero  subblocks.  Further,  let  Sh  denote 
the  set  of  information  sequences  U[t-„,t+h+v]t  0  <  h  <T  and 
let  S'*?,  denote  the  set  of  information  sequences  in  Sh  which 
differ  from  the  all  zero  sequence  at  least  on  the  ki  data  sec¬ 
tion.  Furthermore,  let  si2  ( ki )  denote  the  set  of  information 
sequences  in  Sh  which  differ  from  the  all  zero  sequence  only 
on  the  k2  section  while  the  k\  section  identifies  to  that  of  the 
all  zero  sequence  (alternatively,  to  the  k\  section  of  the  correct 
sequence). 

Let  F(h,di)  denote  the  fraction  of  codes  with  a  nonzero 
codeword  of  weight  less  than  di  produced  by  an  information 
sequence  from  the  set  S £, .  Similarly  let  F(h,  d2\ki)  denote  the 
fraction  of  codes  with  a  nonzero  codeword  of  weight  less  than 
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d2  produced  by  an  information  sequence  in  S^2(k\).  With 
these  definitions  we  have  the  following 

Lemma  1:  A  sufficient  condition  for  the  existence  of  a  code 
that  has  minimum  codeword  weight  not  smaller  than  d2,  and 
codeword  weight  of  at  least  d\  for  information  sequences  dif¬ 
fering  on  fci,  where  d\  >  d2  is 


£ 


F{h,di)  +  F(h,d2\ki) 


h= 1 


<  1  ■ 


(1) 


Using  this  result  Costello’s  [3]  technique  is  extended  to  the 
ensemble  of  unequal  error-protection  time  varying  codes. 

Lemma  2:  Consider  the  ensemble  £(k,N,v,T)  of  binary, 
rate  R  =  (ki  +  k2)/N,  periodically  time- varying  convolutional 
codes  encoded  by  polynomial  generator  matrices  of  memory 
length  v\  for  the  ki  important  bits  and  v2  for  the  k2  less 
important  bits  where  ku  =  fciiq  +  k2v2.  The  fraction  of  codes 
whose  jth  order  active  row  distances  aj  (2)  ,  0,  (1),  a3  (1)  > 
Oj  (2),  0  <  j  <  T,  satisfy  respectively 
aj  (2)  <  a,(2)  <  (j  +  1/2  +  l)N/2  or 
o,(l)  <  Oj(l)  <  (j  +  1/1  +  l)N/2 
does  not  exceed 

T2(j+1/1  +i)jv  ^-1) (1  _ 

+T20^2+i)«r(3^rR2+fl(u^w)-i)  ) 


where  H(-)  denotes  the  binary  entropy  function. 

As  a  corollary  to  Lemma  2  we  derive  a  lower  bound  on  the 
corresponding  active  row  distances. 

Our  main  conclusion  is  that,  in  the  asymptotic  case  of  large 
v,  non-uniform  error  protection  is  feasible  by  splitting  the 
memory  unevenly  between  mi. 
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Abstract  —  A  new  construction  of  dc-free  error- 
correcting  codes  based  on  convolutional  codes  is  pro¬ 
posed.  The  encoder  employs  a  Viterbi  algorithm  as 
the  codeword  selector  so  that  the  selected  code  se¬ 
quences  satisfy  the  dc  constraint.  Some  important 
parameters,  including  the  free  distance,  the  running 
digital  sum  (RDS),  and  the  sum  variance  are  investi¬ 
gated. 


I.  Construction  and  Free  Distance 


Source  Input 

»(D) 


Encoder  of 
I  Convolutional  Code  [ 


b(D) 


Codeword  Oulput 
y(D) 


Codcw  ord 
Selector  over 
Convolutional  Code  f 


Figure  1:  Dc-free  encoder. 


Our  construction  of  dc-free  error-correcting  codes  can  be  de¬ 
scribed  by  the  encoder  shown  in  Figure  1.  Codes  Ci  and  C2 
are  (n,ki)  and  ( n,&2 )  binary  linear  convolutional  codes,  re¬ 
spectively,  with  C\  D  C2  =  {0}.  The  information  sequence 
a(D)  is  first  encoded  to  a  code  sequence  x(D).  The  code 
sequence  x(D )  is  then  used  by  the  codeword  selector  which 
produces  a  code  sequence  b(D)  G  C2  so  that  the  final  modi¬ 
fied  sequence  y(D)  =  x(D)  ©  b(D)  satisfies  the  dc  constraint. 
To  ensure  dc-free  transmission,  two  codeword  selection  criteria 
are  proposed,  the  minimum  absolute  RDS  (MRDS)  criterion 
[1]  and  the  minimum  squared  weight  (MSW)  criterion  [2].  The 
codeword  selector,  based  on  the  MRDS  or  MSW  criterion,  is 
implemented  by  a  Viterbi  algorithm  (VA)  with  proper  metric 
assignment.  To  reduce  the  decoding  complexity,  a  subopti- 
mal  decoder  is  proposed  that  is  implemented  by  a  VA  decoder 
operating  over  the  minimum  trellis  of  the  convolutional  code 
Ci  2  =  Ci  ©  C2.  The  free  distance  db  of  the  new  constructed 
code  Cb  is  obviously  bounded  by  db  >  d  12,  where  di2  is  the 
free  distance  of  Ci2-  Define  w[H]  as  the  nonzero  minimum 
weight  of  codewords  in  H  C  C 12.  A  tighter  bound  is  given  in 
the  following  theorem. 

Theorem  1  Let  db  be  the  free  distance  of  Cb;  then  db  > 
W  [C12  \C2] . 

Define  deff  =  w  [Ci2\C2].  For  a  Viterbi  decoder  operating  on 
the  trellis  of  C12,  deff  >s  exactly  the  free  distance  attained  by 
this  suboptimal  decoding  scheme.  A  procedure  is  proposed  for 
determining  deff  based  on  a  minimum-weight  codeword  search 
over  Ci  2. 

II.  Running  Digital  Sum  and  Sum  Variance 

We  present  a  sufficient  condition  for  the  codes  to  have  bounded 
RDS.  Define  D(P)  as  a  set  of  disparities  of  all  binary  vector 
in  P.  The  polynomial  generator  matrix  G2(D )  of  C2  can  be 
expressed  as  G2(D)  =  Gj0' ©G^D®. .  .ffiG^DT  Define  the 
binary  generator  matrices  G 2,r  for  r  =  1, . . .  ,a  +  1  as  G2,T  = 

[[G'0) , . . . ,  G(rl)]T,  [0,  Gi0) , , . . ,  GiT~Y , . . . ,  [0, . . . ,  0,  G'0)n 

Define  E  as  the  set  of  all  possible  states  in  the  trellis  of  C2, 
A^  as  the  set  of  all  possible  rn-tuple  binary  outputs  from 


Ci,  and  /3(<r)  is  a  rn-tuple  binary  output  of  the  encoder  of 
C2  with  the  initial  state  a  and  with  an  all-zero  input. 
Theorem  2  The  RDS  of  Cb  are  bounded  if  there  exists 
some  t,  r  =  1,2,  ■  •  •  ,a  +  1,  such  that,  for  arbitrary  x  G  A(r) 
and  a  G  E,  the  set  D{{G2,r)b  ©  0{<r)  ©  x)  contains  opposite 
polarities  or  a  zero. 

Let  t  be  the  smallest  integer  of  r  that  satisfies  Theorem  2.  By 
a  simplified  codeword  selection  algorithm,  the  sum  variance 
of  the  new  code  can  be  shown  to  be 


A-B 


(1) 


A  = 


nt  ^  P(%rn  —  Z,  Sm  —  a) 

j-i 

^2,P{Xm  =  x)  (z  +  'yj(bz,„(x)  ©  x))2 


xGA 


B  = 


n 

?=i 

^F(Am  =  x)  (z +  7,7(5*, *(:<;)  ©x))]2, 
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A 

where  n'  =  tn,  A  is  the  set  of  all  the  possible  outputs  of  length 
n'  of  Ci,  7 j(bz,<r(x)  ©  x),  j  —  1,2,...,  n',  are  the  disparities 
among  bz,„(x)  ffi  x,  fi  is  the  set  of  all  the  possible  RDS,  E  is 
the  set  of  all  possible  states  in  the  trellis  of  C2,  and  6*,<t(x) 
is  the  n' -tuple  binary  output  of  C2  that  corresponds  to  the 
path  beginning  at  state  (z,a)  and  minimizing  the  absolute 
value  of  2  +  7n'(5*,tr(x)  ©x).  The  state  ( Zm,Sm )  can  be  cast 
into  a  Markov  chain  model  and  the  stationary  probabilities 
P(Zm  =  z,  Sm  =  cr)  can  be  evaluated  by  a  simple  matrix 
inverse.  The  results  are  then  substituted  into  (1)  to  obtain 
the  sum  variance. 
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Abstract  —  A  novel  DC-free  binary  convolutional 
coding  scheme  is  presented.  The  proposed  scheme 
achieves  DC-free  and  error  correcting  capability  si¬ 
multaneously.  The  scheme  has  a  simple  cascaded 
structure  of  the  RDS (running  digital  sum)  control 
encoder  and  the  conventional  convolutional  encoder. 
The  scheme  provides  wide  varieties  of  reasonable 
tradeoffs  between  the  coding  gain,  the  RDS  con¬ 
straint,  and  decoding  complexity. 

I.  Introduction 

The  DC-free  coding  is  widely  used  in  digital  communica¬ 
tion  and  magnetic/optical  recording  areas.  We  here  present  a 
DC-free  convolutional  coding  scheme  with  error  correcting  ca¬ 
pability.  Figure  1  illustrates  the  configuration  of  the  proposed 
coding  scheme.  First,  the  user  message  sequence  is  encoded 
to  the  intermediate  sequence  by  a  RDS  control  encoder.  The 
convolutional  encoder  then  converts  an  intermediate  sequence 
to  the  coded  sequence.  After  the  binary-bipolar  conversion, 
the  coded  sequence  is  transmitted  to  the  channel.  The  scheme 


Figure  1:  DC-free  convolutional  coding  scheme 


is  suitable  for  a  power  limited  noisy  channel.  Availability  of 
soft  decision  decoding  is  one  of  the  major  advantages  of  the 
proposed  scheme.  By  using  the  RDS  bound  which  have  been 
derived  in  this  research,  we  can  guarantee  that  the  RDS  val¬ 
ues  obtained  from  the  proposed  scheme  belong  to  a  certain 
bounded  range.  The  proposed  scheme  is  based  on  the  fol¬ 
lowing  three  major  ideas:  (1 )  additive  encoding  using  a  binary 
linear  block  code,  (2)  upper  and  lower  bound  on  the  RDS  for 
an  additive  encoder,  and  (3)  splitting  a  convolutional  code 
into  infinite  sequences  of  a  linear  binary  block  code,  which  is 
called  a  window  code. 

II.  Additive  encoder 

Assume  an  infinite  length  binary  message  sequence 
{ao,oi,...}.  Each  vector  ai(i  =  0,1,2,...)  belongs  to 
F^1 .  An  additive  encoder  encodes  a  message  block  Oi  to 
d  €  C  for  each  block  index  i.  The  code  C  is  a  binary  linear 
code  of  length  n.  The  resulting  sequence  (co,  Ci, . . .}  is  called 


a  coded  sequence.  The  additive  encoder  appends  redundancy 
ko  =  n  —  ki  bits  per  block  and  thus  the  coding  rate  becomes 
' fci/n .  After  the  binary-bipolar  conversion,  the  bipolar 
sequence  {/(co), /(ci), . . .}  is  transmitted  over  the  noisy 
channel,  where  /  is  the  binary  to  bipolar  conversion  map. 
For  achieving  DC-free  transmission,  the  additive  encoder  has 
to  generate  the  coded  sequence  with  a  RDS  constraint.  An 
additive  encoder  encodes  a  message  block  a*  into  a  in  such 
a  way: 

Ci  =  hi  Go  ©  ttiGr, 

where  bi  €  F*0  is  selected  by  the  additive  encoder  according 
to  the  value  of  the  RDS  and  a  selection  rule.  The  matrices  Go 
and  Gi  span  sub-spaces  of  G.  We  call  the  vector  bi  the  con¬ 
trol  vector.  In  other  words,  the  additive  encoder  has  freedom 
to  select  a  control  vector  and  should  specify  a  control  vector 
so  as  to  obtain  a  code  sequence  which  keeps  the  RDS  value 
bounded.  In  this  setting,  upper  and  lower  bounds  on  the  RDS 
for  the  additive  encoder  have  been  derived. 

III.  DC-free  convolutional  code 

The  idea  of  the  additive  encoding  Can  be  applied  to  binary 
convolutional  codes.  The  main  idea  is  to  apply  the  additive 
encoding  method  to  window  codes  obtained  from  a  convolu¬ 
tional  code.  A  window  code  is  a  binary  linear  block  code 
obtained  by  splitting  a  convolutional  code  into  an  infinite  se¬ 
ries  of  blocks.  Thus,  the  RDS  bound  for  block  codes  can  be 
extended  to  the  case  of  convolutional  codes.  For  a  given  win¬ 
dow  code,  we  need  a  good  decomposition  of  the  window  code 
for  achieving  a  tight  RDS  constraint.  We  have  performed  ex¬ 
haustive  computer  searches.  Table  1  presents  the  results  for 
the  case  where  the  base  convolutional  code  is  rate  1/2  64-state 
convolutional  code  with  d/ree  =  10.  For  example,  a  64-state 
DC-free  coding  scheme  with  the  overall  rate  6/16  satisfies  a 
bounded  RDS  condition(from  C  =  —18  to  U  —  +18)  and  it 
yields  the  asymptotic  coding  gain(ACG)  of  5.7  dB.  We  have 
performed  encoding  simulations  as  well.  In  Table  1,  the  results 
of  encoding  simulations  are  also  shown. 


Table  1:  Results  on  searches  and  simulations 


R 

C 

U 

L 

U 

ACG(  dB) 

5/14 

-13 

+13 

-11 

+13 

5.53 

4/14 

-8 

+8 

-6 

+8 

4.56 

3/14 

-7 

+7 

-7 

+5 

3.31 

6/16 

-18 

+18 

-12 

+13 

5.74 

5/16 

-9 

+9 

-7 

+9 

4.95 

4/16 

-7 

+7 

-5 

+7 

3.98 

R:overall  coding  rate, 

C  and  tUtheoritical  lower  and  upper  bounds  on  RDS 
L  and  U observed  RDS’s  in  encoding  simulation 

ACG  =  101Og10(Rd/ree) 
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Abstract  —  Systems  employing  variable-length  source  codes  are 
prone  to  error  propagation.  Several  techniques  that  involve  vary¬ 
ing  levels  of  cooperation  between  the  channel  decoder  and  source 
decoder  are  considered  for  improving  performance.  At  one  extreme, 
conventionaltandemdecodingperforms  channel  decodingand  source 
decoding  independently;  at  the  other  extreme,  joint  source-channel 
MAP  decoding  combines  the  two  into  a  single  decoder.  Simulation 
results  indicate  that  joint  source-channel  list  decoding  with  “trellis 
pruning”  can  result  in  significant  improvement  over  conventional 
tandem  decoding. 

I.  Introduction 

Most  efficient  data  communication  systems  employ  source  coding 
(compression)  and  channel  coding  (error-control).  Variable-length 
source  codes  (VLCs,  e.g.,  Huffman  codes)  and  convolutional  channel 
codes  are  commonly  employed  in  such  systems,  and  the  receiver  typ¬ 
ically  uses  a  tandem  decoding  scheme  -  i.e.,  a  maximum-likelihood 
(ML)  Viterbi  decoder  for  the  channel  code  followed  by  an  indepen¬ 
dent  source  decoder.  When  transmitted  over  a  noisy  channel,  error 
propagation  results  if  a  bit  error  causes  the  source  decoder  to  in¬ 
correctly  interpret  a  source  codeword  as  a  codeword  of  a  different 
length.  While  VLCs  exhibit  an  impressive  ability  to  resynchronize, 
the  resulting  shift  in  the  decoded  sequence  (relative  to  the  transmitted 
sequence)  is  often  considered  catastrophic. 

II.  Methods  for  Mitigating  Error  Propagation 

Since  error  propagation  is  associated  with  symbol  addi¬ 
tions/deletions,  one  approach  to  limiting  propagation  is  to  packetize 
the  data  and  convey  to  the  decoder  the  bit-size  and  symbol-size  of 
each  packet.  (Typically,  one  of  these  would  be  a  (fixed)  parameter  of 
the  protocol  and  the  other  is  reliably  conveyed  to  the  decoder,  e.g., 
as  part  of  the  packet  header,  protected  with  a  powerful  block  code.) 
Several  methods  for  exploiting  this  side  information  to  improve  the 
probability-of-symbol-error  performance  of  the  system  are  discussed 
below. 

Tandem  Decoding:  The  conventional  scheme  consisting  of  a 
channel  decoder  and  a  source  decoder  operating  independently  in 
tandem  is  used  as  the  baseline  for  comparison.  The  channel  decoder 
performs  maximum  likelihood  (ML)  Viterbi  decoding,  and  “tosses” 
its  estimated  sequence  “over  a  wall”  to  the  source  decoder,  which 
maps  it  onto  the  corresponding  source-symbol  sequence.  If  too  many 
symbols  are  generated,  the  extra  ones  are  discarded;  if  too  few  symbols 
are  generated,  the  sequence  is  padded. 

List  Decoding:  In  list  decoding  [1],  the  channel  decoder  is 
modified  so  that,  at  each  decoding  stage,  the  decoder  retains  the  L 
most  likely  paths  among  those  paths  merging  at  a  given  state.  After 
decoding  a  block,  the  L  best  paths  among  the  survivors  are  selected 
and  provided  to  the  source  decoder.  The  source  decoder  decodes  each 
of  the  L  sequences  and  selects  from  the  resulting  sequences  the  most 
likely  path  that  maps  to  the  correct  number  of  symbols. 

'Supported  in  part  by  NSF  grant  CCR-99-96222. 


Source  Decoder  Assisted  List  Decoding:  This  is  a 
modification  of  the  above  list-decoding  scheme;  with  this  strategy, 
the  channel  decoder  is  provided  with  information  about  the  length  (in 
source  symbols)  of  each  path  through  the  trellis.  The  list  decoder, 
when  selecting  the  survivors  into  a  state,  selects  the  L  best  paths  with 
distinct  symbol  lengths. 

Trellis  Pruning:  Here,  the  channel  decoder  is  provided  at 
each  decoding  stage  with  an  indication  whether  the  path  has  a  valid 
extension  with  the  correct  symbol  length.  Any  path  with  no  valid 
extension  is  eliminated  from  consideration. 

Hybrid  Schemes:  List  decoding  and  trellis  pruning  can  be  com¬ 
bined  in  a  straight  forward  manner  and  result  in  a  further  improvement 
in  performance.  Combining  source-assisted  list  decoding  with  trellis 
paining  is  expected  to  result  in  further  improvement  in  performance. 

Example:  Consider  a  memoryless  source  with  alphabet 

{a,b,  c,d,  e,  f,  g,  h}  and  probabilities  {0.75,  0.15,  0.07,  0.02,  0.007, 
0.002.  0.0007,  0.0003}.  The  source  code  is  a  Huffman  code;  the 
channel  code  is  the  convolutional  code  with  generator  G(D)  = 
(l  +  D2, 1  -f  D  +  D2)  followed  by  BPSK  modulation.  Simulation 
results  illustrating  the  performances  of  the  above  schemes  are  shown 
in  Figure  1.  Increasing  cooperation  results  in  improved  performance, 
and  list  decoding  with  trellis  pruning  provides  the  best  performance 
for  this  example.  Note  that  increasing  the  list  size  from  L  =  2  to 
L  =  3  yields  only  a  small  improvement. 


a  4  •  t  io  u 

EtVNo(dB) 

Fig.  1.  Performance  of  various  decoding  schemes  discussed  in  the  text 
for  the  system  outlined  in  the  example. 
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Abstract  —  We  present  a  new  class  of  channel 
codes,  which  we  call  Source  Optimized  Channel  Codes 
(SOCCs).  These  non-linear  codes  are  designed  to 
maximize  a  given  analogue  quality  measure  in  con¬ 
sideration  of  source  and  channel  statistics. 


I.  Introduction 

Unlike  conventional  channel  coding  which  usually  mini¬ 
mizes  the  residual  bit  or  sequence  error  rate,  we  design  a  new 
class  of  non-linear  block  codes  which  minimizes  a  given  quality 
measure  in  the  domain  of  continuous-valued  source  encoder 
symbols,  e.g.  parameters  of  a  speech  encoder.  These  codes 
are  called  Source  Optimized  Channel  Codes  (SOCCs)  [1,  2], 
At  the  receiver,  we  do  not  exploit  the  code  redundancy  for 
error  correction,  but  for  parameter  estimation  [3].  The  per¬ 
formance  of  SOCCs  is  compared  to  that  of  a  reference  system 
which  wais  developed  at  the  Institute  for  Communications  En¬ 
gineering  at  Munich  University  of  Technology  [6]  This  refer¬ 
ence  employs  rate  compatible  convolutional  codes  [4]  for  Un¬ 
equal  Error  Protection  (UEP)  and  Source  Controlled  Channel 
Decoding  (SCCD)  [5]. 


II.  Communication  Model 


By  the  model  shown  in  Figure  1,  we  simulate  a  block- 
oriented  speech  transmission.  The  source  encoder  is  repre¬ 
sented  by  a  vector  source  producing  L-dimensional  real  val¬ 
ued  parameter  vectors  u  —  (ui ,..,ul).  To  mimic  residual 
inter-frame  correlation  each  component  m  is  independently 
modeled  by  a  Gaussian  low-pass  source  with  <pu iUi(l)  =  (>■ 
Each  vector  component  is  quantized  independently.  Instead 
of  conventional  linear  channel  encoding  as  used  e.g.  in  mobile 
telecommunications,  we  apply  non-linear  Source  Optimized 
Channel  Codes  (SOCCs)  to  encode  the  quantized  parameter 
vectors  u  to  a  binary  channel  sequence  x.  At  the  receiver, 
parameter  estimates  u  are  extracted  from  the  observed  soft 
bit  sequence  y  by  Softbit  Source  Decoding  (SBSD)  [3]. 
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Q 
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/ 


■e 


AWGN 

y 
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Estimator 


SD 


N  bits  N  soft  bits  L  dimensions 


Fig.  1:  Communication  model 
SE:  parameter  source  (model  of  the  source  encoder), 
Q:  quantizer,  SD:  parameter  sink  (source  decoder) 


III.  Source  Optimized  Channel  Codes 

We  assume  a  given  quality  measure  Z?(u,  u)  and  a  statistical 
model  of  the  transmission  channel  y  =?  f(x)  which  is  described 
by  Py|x(y|x).  The  optimal  decoder  (estimator)  with  respect 
to  V  and  t  is  denoted  by  u  =  /©,t( y).  Then  we  define  a  SOCC 
as  a  set  of  channel  symbols 

C={x|x  =  $[u],  uGU}  ,  (1) 


which  results  from  solving  the  optimization  problem 

E  {£>(u,  fv,t{  t($[u])  ))}  =  min  ,  (2) 

where  E{-}  denotes  expectation.  Hence,  SOCCs  minimize  the 
mean  distortion  V(u,  u)  measured  between  quantized  and  es¬ 
timated  parameter  vectors. 

IV.  SOCC  Performance 

Figure  2  depicts  a  performance  comparison  between 
SOCC/SBSD  and  UEP/SCCD  at  a  transmission  rate  of  4  bits 
per  vector  dimension.  Three  values  of  residual  inter-frame 
correlation  are  considered:  p  =  0,  0.75  and  0.9.  SOCC/SBSD 
outperforms  UEP/SCCD  for  all  channel  conditions  if  p>0.75. 
For  p  =  0.9,  a  gain  in  parameter  SNR  of  at  least  1  dB  and  up 
to  3dB  can  be  observed.  In  addition,  the  SOCC/SBSD  sys¬ 
tem  exhibits  a  graceful  analogue-type  degradation,  whereas 
UEP/SCCD  shows  the  well  known  threshold  effect. 


Fig.  2:  SOCC/SBSD  vs.  UEP/SCCD,  4  bits  per  dim. 
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Abstract  —  In  digital  transmission  of  speech,  audio, 
images  and  video  signals  residual  redundancy  is  of¬ 
ten  left  after  source  coding  due  to  the  complexity  and 
delay  constraints.  This  redundancy  remains  both  in¬ 
side  one  block  or  frame  but  also  in  a  time  correlation 
of  subsequent  frames.  Both  kinds  of  redundancy  are 
used  in  an  iterative  process  of  source  and  channel  de¬ 
coding  to  improve  the  quality  of  transmitted  parame¬ 
ters.  For  better  understanding,  Gaussian  distributed 
and  time  correlated  parameters  are  used. 

In  [1]  it  is  shown  that  a  priori  knowledge  can  be  used  ei¬ 
ther  in  channel  or  source  decoding.  In  [2]  an  approach  is  shown 
which  exploits  a  priori  knowledge  in  channel  and  source  de¬ 
coding  and  an  additional  correction  of  the  a  priori  probabili¬ 
ties  leads  to  an  improvement  in  the  parameter  SNR.  Now  this 
approach  is  extended  to  iterative  source  and  channel  coding. 


bit  level  symbol  level 


Fig.  1:  Block  diagram  of  iterative  source/channel  decoding. 


In  the  following  Fig.  1  is  explained  by  an  example.  A  Gaus¬ 
sian  distributed  parameter  with  time  correlation  p  =  0.8  is 
Lloyd-Max  quantized  with  8  levels  and  these  levels  are  as¬ 
signed  to  3  bits  Xi  with  folded  binary  mapping.  20  parameters 
axe  then  placed  within  one  frame.  They  are  not  correlated  to 
each  other  but  in  time  from  frame  to  frame.  The  interleaver 
orders  the  20  MSBs  of  the  correlated  parameters  in  the  mid  of 
the  frame;  to  the  left  and  the  right  there  are  first  placed  the 
20  mid  bits  (10  each  side)  and  then  the  LSBs.  Finally,  20  bits 
are  put  to  the  beginning  and  to  the  end  of  the  block  (dummy 
bits),  so  that  the  influence  of  the  definite  start  and  termina¬ 
tion  of  the  code  can  be  neglected.  This  leads  to  a  blocklength 
of  100  bits  which  are  coded  by  a  rate  1/2,  constraint  length  4 
recursive  systematic  convolutional  code  and  transmitted  over 
an  AWGN  channel. 

Through  this  mapping  and  interleaving  a  typical  “Turbo” 
decoding  system  is  designed.  There,  the  extrinsic  information 
was  introduced.  In  the  same  manner  we  “subtract”  (in  the 
log  domain)  the  a  priori  information  P2,ext  from  Pi(xi|Y,X) 
after  channel  decoding  and  the  input  into  the  source  decoder 

*  This  work  was  done  while  all  authors  were  with  AT&T  Labs 
-  Research. 


Pi,ext  from  its  output  P2(:ri|X,  Y).  The  capital  Y,X  denote 
the  dependence  of  the  bit  Xi  from  all  channel  values  and  pa¬ 
rameters  not  only  in  the  current  frame  but  also  in  the  previous 
frames.  The  channel  values  y  and  the  probabilities  denoting 
the  time  correlation  P(x|x(— 1))  are  the  two  inputs  for  the 
system.  The  output  delivers  a  mean  square  estimation  for  the 
considered  parameters. 


Es/N0  in  dB 


Fig.  2:  Combined  Source  and  Channel  Decoding  (SaCD)  using  a 
priori  knowledge:  Comparison  of  the  iterative  approach  (4  and  2 
iterations),  decoding  with  a  priori  knowledge  (AK)  in  channel 
(CD)  or  source  decoding  (SD),  and  neglected  AK. 

From  Fig.  2  the  answer  to  the  title  question  can  be  seen: 
“A  priori  information  can  be  used  twice!”  There’s  almost  no 
further  gain  by  doing  more  iterations.  The  reason  is  that 
in  the  source  decoder  we  only  have  three  bits  denoting  one 
symbol  and  the  correlation  of  this  symbol  to  the  previous  one 
is  similar  to  a  very  short  code.  Maybe,  the  whole  system 
could  be  improved  if  there  is  correlation  between  parameters 
within  on  frame.  This  approach  can  be  implemented  e.g.,  into 
a  speech  transmission  standard.  With  some  extension  this  is 
done  for  the  ANSI-136  system  [3]. 
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Abstract  —  In  the  standard  model  with  only  pair¬ 
wise  communication  channels,  unconditionally  secure 
broadcast  among  n  players  is  achievable  if  and  only  if 
the  number  t  of  corrupted  players  satisfies  <  <  f  •  We 
show  that,  when  additionally  given  broadcast  among 
each  subset  of  three  players  then  global  broadcast  is 
achievable  if  t  <  j. 

I.  Introduction 

Given  a  set  P  =  {pi, . . .  ,pn}  of  n  players,  thg  goal  of  broad¬ 
cast  is  to  let  some  distinct  player  d  S  P  (called  dealer)  reliably 
distribute  a  value  to  all  players  in  P,  i.e.,  all  correct  (i.e.  un¬ 
corrupted)  players  must  receive  the  same  value  ( agreement ), 
and  if  the  dealer  is  correct  then  this  must  be  the  value  the 
dealer  intended  to  distribute  (validity). 

In  this  paper  we  focus  on  broadcast  protocols  that  axe  un¬ 
conditionally  secure  against  an  adversary  that  may  actively 
corrupt  up  to  t  of  the  n  players.  To  actively  corrupt  a  player 
means  to  make  him  deviate  from  the  protocol  in  an  arbitrarily 
malicious  way.  Unconditionally  secure  means  that  the  correct¬ 
ness  of  the  protocol  does  not  rely  on  any  further  restriction 
on  the  power  of  the  adversary  than  the  threshold  t  of  players 
he  can  corrupt  during  the  protocol. 

Since  the  network  typically  consists  only  of  communication 
channels  among  subsets  of  players  and  some  of  the  players,  es¬ 
pecially  the  dealer,  may  be  corrupted  by  the  adversary,  broad¬ 
cast  is  a  non-trivial  problem. 

Pease,  Shostak,  and  Lamport  [2]  proved  that,  according  to 
the  standard  communication  model  of  a  complete  synchronous 
network  of  pairwise  authentic  channels  among  each  pair  of 
players,  unconditionally  secure  broadcast  is  achievable  if  and 
only  if  t  <  j.  The  communication  model  considered  in  this 
paper  extends  this  standard  model  by  a  synchronous  network 
of  authentic  broadcast  for  each  subset  S  C.  P  of  the  players  of 
cardinality  |S|  =  3,  i.e., 

•  for  every  subset  of  three  players  and  for  any  selection  of 
a  dealer  among  them  there  is  a  broadcast  channel,  and 

•  for  every  such  channel,  all  involved  players  are  authen¬ 
tic,  i.e.,  every  correct  player  is  able  to  assign  a  received 
message  to  its  corresponding  broadcast  invocation. 

A  broadcast  primitive  or  protocol  for  n  players  that  is  secure 
against  t  corrupted  players  is  called  (n,t) -broadcast. 

II.  Results 

Theorem  1  Given  ('A,!) -broadcast,  (n,  [-2-^ij)-6roa<'ica,st  is 
achievable  for  any  n  >  3. 

The  basic  idea  is  to  take  some  known  broadcast  protocol 
(e.g  [2])  for  some  virtual  player  set  Q  (|Q|  =  n!)  in  the  stan¬ 
dard  model  that  tolerates  t'  <  \  corrupted  players  among  Q 
—  where,  for  the  moment,  n'  can  be  supposed  to  arbitrary, 

1Department  of  Computer  Science,  ETH  Zurich,  CH-8092 
Zurich,  Switzerland.  E-mail:  {f itzi,maurer}fiinf .ethz. ch.  Re¬ 
search  supported  by  the  Swiss  National  Science  Foundation,  SPP 
project  no.  5003-045293. 


i.e.,  not  necessarily  dependent  on  k.  Instead  of  letting  the  vir¬ 
tual  players  directly  participate  in  the  protocol,  every  virtual 
player  qi  is  simulated  by  some  specific  collection  Si  C  P  of 
the  actual  players  (according  to  player  simulation  in  [1]).  If 
it  can  be  achieved  that  at  most  t'  <  players  q,  are  incor¬ 
rectly  simulated  then  the  protocol  achieves  broadcast  among 
the  players  in  Q  (with  respect  to  the  players  qj  that  are  cor¬ 
rectly  simulated).  Finally,  broadcast  among  the  players  in  P 
can  then  easily  be  derived  from  broadcast  among  the  players 
in  Q. 

The  following  proposition  immediately  follows  from  [1]. 

Proposition  1  A  player  q,  €  Q  of  any  protocol  among  a 
player  set  Q  can  be  simulated  correctly  by  a  collection  of  play¬ 
ers  S  if  broadcast  among  the  players  in  S  is  possible  and  less 
than  ^  players  in  S  are  corrupted. 

Proof  of  Theorem  1:  The  proof  of  this  theorem  is  based  on 
a  recursive  construction  that,  for  any  fc  >  0,  allows  to  achieve 
(2k  +  3,  k  +  l)-broadcast  from  (2k  +  1,  fc)-broadcast.  Finally, 
(3,  l)-broadcast  can  then  be  used  as  a  base  for  the  recursive 
construction  in  order  to  achieve  any  (n,  [  J ) -broadcast . 

Let  P  be  a  set  of  2k  +  3  players  and  assume  (2k  4-  1,  k)- 
broadcast  to  be  achievable  among  any  S  C  P  with  |S|  =  2fc+l. 
We  define  a  set  Q  of  n'  =  (^i)  virtual  players  and  in¬ 
volve  them  in  some  standard  broadcast  protocol  that  tolerates 
t'  <  £-  player  corruptions.  We  now  let  every  possible  collec¬ 
tion  S  C  P  of  |S|  =  2fc  + 1  players  from  P  simulate  exactly  one 
player  g;  €  Q.  Such  a  player  q,  is  simulated  correctly  if  at  least 
k  +  1  of  the  simulating  players  are  correct  themselves  (since 
k  - 1-1  constitutes  a  majority  and  hence  broadcast  among  S 
works  correctly  and  hence  Proposition  1  applies),  i.e.,  at  least 
fc  + 1  of  the  simulating  players  in  S  must  be  corrupted  by  the 
adversary  in  order  to  corrupt  the  corresponding  virtual  player. 
Hence  at  most  t'  <  (fc£2)  players  of  the  original  protocol 

can  be  corrupted  which  are  given  by  all  simulating  collections 
S  C  P  of  cardinality  |5|  =  2fc  +  1  including  at  least  fc  +  1 
corrupted  players.  Since  fc  >  0  we  get 

n'  (”+?)  2(2fc  +  3)  2 

V  ~  ( k+ 2)  fc  +  2  fc  +  2  ^  ’ 

and  hence  strictly  less  than  a  third  of  the  players  in  the  orig¬ 
inal  protocol  is  corrupted.  Finally  we  can  let  every  simulated 
player  send  his  result  to  every  simulating  player  who  then  can 
compute  the  outcome  of  the  broadcast  by  a  majority  voting 
on  all  received  values.  ■ 
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Abstract  —  A  new  technique  for  secure  information 
transmission  in  a  mobile  environment  using  the  short 
term  reciprocity  of  the  radio  channel  was  described 
in  [1].  This  paper  evaluates  the  performance  and  se¬ 
curity  aspects  of  the  technique. 

I.  System  Model  and  Algorithm  Review 

Users  A  and  B,  at  least  one  of  them  being  mobile,  must  com¬ 
municate  in  a  secure  manner  in  the  presence  of  an  adversary 
£  on  a  common  wireless  medium.  Assuming  that  B  is  trans¬ 
mitting  information  to  A,  the  communication  is  achieved  in 
two  steps.  The  first  step  involves  a  transmission  of  M  si¬ 
nusoids  at  frequencies  /i ,  fa , . . . ,  /jw  with  equal  phases  and 
equal  energies  from  user  A  to  user  B.  The  signal  transmit¬ 
ted  by  A  in  the  k^1  signaling  interval  (kT,  (k  +  1)T]  is  given 
by  sa(£)  =  YliLi  cos fit  +  <t>).  The  mobile  channel 
is  a  time-varying  fading  channel  with  additive  white  Gaus¬ 
sian  noise.  The  sinusoids  cos{2irfit)  are  separated  by  at  least 
the  coherence  bandwidth  of  the  channel.  The  receiver  dif¬ 
ferentially  estimates  the  (M  —  1)  received  phase  differences 
(©2 (k)  —  ©i (A:)), . . . ,  (©M(fc)  —  ©i(fc))  between  the  various  si¬ 
nusoids.  Now,  B  has  probed  the  response  of  the  channel  to 
the  transmission  of  these  multiple  sinusoids.  In  the  next  step, 
the  knowledge  of  this  response  is  used  by  user  B  to  transmit 
information  to  user  A .  This  is  done  by  transmitting  a  sig¬ 
nal  consisting  of  sinusoids  of  the  same  frequencies  but  with 
the  phases  of  each  of  the  sinusoids  modified  so  as  to  control 
the  phase  differences  received  by  user  A  to  fall  within  one 
of  R  decision  regions  depending  on  the  information  symbols 
to  be  transmitted.  The  signal  transmitted  by  B  is  given  by 
sB{t)  =  Efci  V/^'cos(27r/it-(0i-©i)+^i),  'Fi  =  0,  e 
{ — 7T,  -it  +  2ir/R,  . . . ,  -7r  +  2(R—  l)n/R],  i  €  {2...M},  where 
&i  —  0i  are  the  phase  differences  detected  from  A’s  transmis¬ 
sion,  and  A/i  is  determined  by  the  information  to  be  trans¬ 
mitted  and  the  mapping  between  each  decoding  region  and 
the  information  bits.  The  signal  that  is  received  by  A  is  now 
given  by  rA{t)  =  Eeli  Vr cos(2ir  fit  +  'Fi)  +  n(t),  = 

Ojflq  €  {—7 r,  —tt  +  2tt/R,.  . .  ,— 7T-I-2 (R—  1)7 r/R},  i  S  {2..M}. 
User  A  detects  the  M  -  1  phase  differences,  (0i  -  0i)  =  db, 
i  =  2..M,  and  for  each  phase  difference  it  decodes  the  corre¬ 
sponding  information  symbols. 

II.  Performance  and  Security 

A  symbol  error  is  made  on  reception  by  A  if  the  total  phase 
error  due  to  the  phase  errors  at  B  and 

the  phase  error  <3?^  at  A  forces  the  ilh  phase  difference  at 
A  to  fall  within  a  region  other  than  the  desired  region.  It 
can  be  shown  that  the  conditional  probability  density  func¬ 
tion  of  <3?f  and  <3?e  is  given  by  p<t  {(p |F)  =  ^  exp  {-F}  + 
(VTcos  <p)  ■  exp  {— rsin2  </>}  [l  —  Q  (%/2T cos  <p )] ,  where 

A2  A2 

r  =  Ai\2i  7^-  The  probability  density  function  Pri'l)  can 
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be  shown  to  be  ft  (7)  =  /„*  =ji  expj-S^-y  j  dx, 

where  7  =  Aa1  E /Na  is  the  average  signal  to  noise  ratio.  Now, 
the  probability  density  function  of  =  <3?^  +  <3?f  is  obtained 
as  O)  =  /0°°P*«  (<£|r)  ft  (7)^7,  where  p<tc(4> |r)  is  the 
convolution  of  two  identical  density  functions  for  and  . 
Note  that  this  sum  can  take  values  in  [ — 27T ,  2tt).  The  proba¬ 
bility  of  symbol  error  may  be  obtained  from  the  above  density 
function  after  resolving  the  ambiguities  of  2ir.  For  R  =  2, 
the  performance  is  close  to  that  of  differential  PSK  in  a  flat 
Rayleigh  fading  channel. 

The  security  of  the  proposed  method  depends  completely 
on  two  basic  assumptions:  the  reciprocity  assumption  and 
the  spatial  decorrelation  assumption.  If  the  phase  differences 
at  the  intended  users  locations  and  adversary  location  are 
statistically  independent,  then  the  amount  of  work  required 
to  break  the  system  approaches  that  of  a  simple  exhaustion 
of  trials  of  the  cryptovariable.  To  inspire  the  reciprocity  as¬ 
sumption  with  mobility,  consider  a  mobile  with  a  speed  of  100 
km/hr  and  using  a  carrier  in  the  900  MHz  region;  with  a  delay 
of  10  //sec,  the  distance  moved  by  the  mobile  would  be  0.00028 
m,  which  is  negligible  compared  to  the  wavelength  0.33  m.  To 
motivate  the  assumption  of  phase  independence,  let  the  dis¬ 
tance  between  the  locations  of  £  and  B  be  many  wavelengths, 
let  $E  be  £’s  estimate  of  ©i  —  ©i  and  define  A?  =  -  <3?E. 

Then,  'F  is  a  random  variable  with  a  probability  density  func¬ 
tion  that  is  a  function  of  a2  =  Jq(wdt)/(1  +  (uq  —  uq)2cr2), 
where  Jo(  )  is  the  Bessel  function  of  order  0,  wdt  is  the 
Doppler  times  delay  between  the  received  phases  used  in  com¬ 
puting  the  phase  differences,  and  a  is  a  time  delay  spread  pa¬ 
rameter  that  ranges  between  1/4  micro-seconds  for  suburban 
areas  to  5  micro-seconds  for  urban  areas  [2].  It  was  shown  in 
[2]  that  for  a2  <  0.4,  ip  is  almost  uniformly  distributed.  That 
is,  if  the  bandwidth  and  time  delay  between  transmissions  sat¬ 
isfy  a 2  <  0.4,  the  phases  are  independent  when  £  and  B  are 
separated  by  many  wavelengths.  This  value  of  a 2  =  0.4  is 
achieved  for  uq  —  W2  =  240  kHz,  if  wd  =  200  Hz,  cr  =  5 ps  as 
is  the  case  for  a  fast  moving  mobile  terminal.  Note  that  r  =  0 
is  chosen  in  this  computation  since  there  is  no  delay  between 
the  received  phases  used  in  computing  the  phase  differences. 
To  compute  the  rate  at  which  we  may  transmit  information 
securely,  we  compute  the  value  of  r  for  which  a2  <  0.4  with 
the  above  parameters  and  with  uq  —  uq  =  0  since  the  phases  at 
the  same  frequency  must  be  sufficiently  de-correlated  in  time. 
The  rate  then  is  calculated  as  (M  —  1  )/r.  For  the  above  sce¬ 
nario,  with  two  tones,  a  rate  of  156  bits  per  second  is  possible. 
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Abstract  —  A  new  identity-based  conference  key 
distribution  scheme  is  presented  using  Harn- Yang’s 
identity-based  digital  signature  scheme. 

I.  Introduction 

In  1993,  Harn  and  Yang  proposed  ID-based  cryptographic 
schemes  for  user  identification,  digital  signature,  and  key  dis¬ 
tribution  [1],  Here  we  extend  their  key  distribution  scheme 
to  a  conference  key  distribution  scheme  using  Harn- Yang’s 
identity-based  digital  signature  scheme. 

II.  NEW  IDENTITY-BASED  CONFERENCE  KEY 
DISTRIBUTION  SCHEME 

The  new  scheme  consists  of  three  phases:  the  initiation  phase, 
the  user  registration  phase,  and  the  application  phase. 

Initiation  phase:  The  key  authentication  center  (KAC) 
selects  a  one-way  function  f,  a  large  prime  p,  and  a  primitive 
element  a  of  GF(p),  which  are  made  known  to  the  public. 
A  random  number  x  6  [l,p  —  1],  with  gcd(x,p  —  1)  =  1,  is 
selected  as  KAC’s  secret  key.  KAC  calculates  his  public  key 
as  follows. 

y  —  ax  ( mod  p).  (1) 

User  registration  phase:  When  a  user,  say  i,  is  register¬ 
ing  in  the  KAC,  he  presents  his  identity  IDi  to  the  KAC.  The 
KAC  computes  for  user  i  an  extended  identity  EIDi  =  f(IDt) 
and  the  signature  (r, ,  s,)  of  EIDi  as 

Si  =  (EID,  —  fc.r,)*-1  ( mod  p  —  1)  (2) 

where  r,  =  ak’  (mod  p)  and  k,  is  chosen  randomly  from  [1, 
p-1]  such  that  gcd(si,p  —  1)  =  1.  Note  that  no  ki  should  be 
used  repeatly.  In  the  application  phase,  s;  is  user  i’s  secret 
key  and  r;  is  user  i's  public  key. 

Application  phase:  Suppose  all  n  users  have  registered 
in  KAC,  and  they  are  connected  in  a  star  network.  With¬ 
out  loss  of  generality,  we  assume  user  1  is  the  chairman,  and 
he  collects  and  delivers  messages  between  him  and  user  j 
(2  <  j  <  n).  In  addition,  all  n  users  share  a  conventional 
encryption  algorithm  Ek(-),  where  K  is  their  shared  key. 

•  User  1  randomly  chooses  a  random  number  vi  £  [1 ,  p —  1] 
such  that  gcd(v\,p  —  1)  =  1.  So,  there  exists  uj"1  such 
that  viv j-1  =  1  ( mod  p  —  1).  Then,  user  1  calculates 

m  =  yVl  ( mod  p) 

rji  =  (m  —  ( mod  p  —  1)  '  ' 

where  m  =  / (ID i,  time).  User  1  sends  (IDi,  ri,  wi,  iji) 
to  user  j  (2  <  j  <  n). 

^his  work  was  partially  done  when  the  author  was  visiting 
Eindhoven  University  of  Technology. 


•  Upon  receiving  (IDi,r\,Wi,r]i),  user  j  checks  whether 
the  following  congruence  holds: 

ym  =  w^1  {aEIDlr^ri)m  ( mod  p).  (4) 

If  (4)  holds,  user  j  chooses  a  random  number  Vj  €  [l,p— 
1]  such  that  gcd(vj,p—  1)  =  1.  So  there  exists  vj1  such 
that  vjvJ1  =  1  ( mod  p  —  1).  Then  user  j  computes 

Wj  =  yVj  ( mod  p) 

rij  =  WjJ  ( mod  p)  (5) 

Pj  =  (rij  —  VjW^sJ1  ( mod  p  —  1). 

Next,  user  j  sends  (ID},  r: ,  w3)  nj,  rjj)  to  user  1. 

•  Upon  receiving  (IDj,rj,Wj,rij,rij),  user  1  checks 

whether  the  following  (n  —  1)  congruences  hold: 

yn>  =  wj' (aE1Djr~rj)r’i  ( mod  p).  (6) 

If  all  the  congruences  hold,  user  1  generates  a  random 
number  r  €  [l,p  —  1]  and  calculates  the  conference  key 
Kc  as  follows. 

Kc—yT  ( mod  p).  (7) 

Also,  user  1  computes 

Zj  —  rtj1  r  (mod  p),  (8) 

and  sends  (zj,  Ejcc(IDi))  to  all  other  users,  where 
Ekc(I D\)  denotes  a  conventional  encryption  of  ID\  us¬ 
ing  Kc. 

•  User  j  (2  <  j  <  n)  computes  the  conference  key 

Kc  =  (zj)vi  ( mod  p),  (9) 

and  verifies  it  through  decryption  of  Ekc(ID\). 

Through  the  above  scheme,  each  user  can  obtain  the  same 
conference  key  Kc.  Since  the  conference  key  depends  on  the 
random  number  r,  Kc  will  be  different  from  one  time  to  the 
next. 

References 

[1]  L.  Harn  and  S.  Yang,  “ID-based  cryptographic  schemes  for  user 
identification,  digital  signature,  and  key  distribution”,  IEEE 
Journal  on  Selected  Areas  in  Communication,  Vol.ll,  No. 5, 
June  1993,  pp. 757-760. 


0-7803-5857-0/00/SI  0.00  ©2000  IEEE. 


269 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


An  Information  Theoretic  Model  for 
Distributed  Key  Distribution 


Carlo  Blundo 

Dipartimento  di  Informatica  ed 
Applicazioni 
Universita  di  Salerno 
84081  Baronissi  (SA),  Italy 
e-mail:  carblu8dia.unisa.it 

Abstract  —  A  Distributed  Key  Distribution  Scheme 
is  a  protocol  enabling  a  set  of  n  servers  of  a  network 
to  jointly  realize  a  Key  Distribution  Center,  a  server 
which  distributes  cryptographic  keys  to  users  for  se¬ 
cure  group  communications.  We  model  Distributed 
Key  Distribution  Schemes  within  an  information  the¬ 
oretic  framework  showing  lower  bounds  on  the  size  of 
the  information  sent  and  stored  by  the  servers  and 
on  the  number  of  random  bits  needed  to  set  up  such 
schemes.  The  bounds  are  tight  as  there  exists  a  pro¬ 
tocol  which  meets  them. 

I.  Introduction 

Enabling  groups  of  users  in  a  network  ( conferences )  to  pri¬ 
vately  communicate  using  symmetric  encryption  algorithms 
requires  an  efficient  protocol  to  give  each  conference  a  key. 

Often,  in  a  network,  there  exists  a  Key  Distribution  Center 
which  is  responsible  of  the  management  of  the  secret  keys. 
If  the  center  works  on-line,  then  users  must  sent  it  requests 
to  obtain  the  common  key.  If  the  center  is  off-line,  then  the 
common  keys  can  be  recovered  by  the  conferences  using  some 
private  information  initially  distributed  by  the  center.  The 
protocols  implemented  by  the  Key  Distribution  Center  are 
called  Key  Distribution  Schemes  (KDSs). 

All  the  previous  KDSs,  concern  with  a  centralized  environ¬ 
ment.  With  a  Distributed  Key  Distribution  Scheme  (DKDS), 
the  secret  keys  are  distributed  between  n  servers  and  it  can  be 
recovered  by  a  user  only  if  he  obtains  answers  to  a  key-request 
message  sent  to  k  out  of  the  n  servers.  The  distribution  avoids 
the  concentration  of  secret  information  in  a  single  place  of 
the  network  and  increases  the  availability  and  security  of  the 
overall  system.  We  are  interested  in  unconditionally  secure 
DKDSs. 

II.  The  Model 

Initially,  a  dealer  distributes  private  information  to  each  server 
of  the  network. 

-  Let  U  =  {1, ..  .,m}  be  a  set  of  users,  let  Si, . . .  ,Sn  be 
the  servers  of  the  network,  and  let  C  be  the  family  of  all 
conferences  of  U  that  need  to  communicate  securely. 

-  Let  Kh  be  the  set  of  possible  keys  Kh  that  can  be  com¬ 
puted  by  the  users  in  Ch  £  C  (let  K/,  be  the  correspond¬ 
ing  random  variable). 

-  Let  Ai  be  the  set  of  values  that  the  server  Si  can  obtain 
privately  from  the  dealer  during  the  initialization  phase. 

-  Let  Y*j  be  the  set  of  values  that  can  be  sent  by 
the  server  Si  to  user  j  £  IA  upon  a  key-request 
message  for  the  conference  Ch  (let  A,  and  Yfj  be 
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the  corresponding  random  variables).  Let  Ys  = 
Y l,j  ■  ■  ■  Yf"1,  Yf+>  . . .  Yfj,  for  «  ?  h,  for  i  =  1, . . . ,  n 
and  j  £U,  and  let  Y  =  Y1 , . . . ,  Yh_1 ,  Yh+1, . . . ,  Y|c|. 

Definition  II. 1  A  (k,n,C) -DKDS  is  a  protocol  which  enables 
each  user  of  Ch  £  C  to  compute  a  common  key  Kh  interacting 
with  at  least  k  of  the  n  servers  of  the  network.  More  precisely: 

-  For  each  conference  Ch  £  C ,  for  each  user  j  £  Ch,  and 
for  each  subset  of  indices  {ii, . . .  ,R)  C  {1, . . .  ,n},  it 
holds  that 

ff(Kk|YflIj,...lYi*tJ)  =  0. 

-  For  each  conference  Ch  £  C  and  for  each  subset  of 

indices  {ii, . . .  C  {1, . . . ,  n),  it  holds  that 

H(Kh\Y,Aii,...,Aik_1)  =  H(Kh). 

The  first  property  of  the  above  definition  establishes  that 
each  user  in  a  conference  Ch  £  C  can  univocally  compute 
the  key  Kh,  after  interacting  with  at  least  k  servers  of  his 
choice.  The  second  property  formalizes  the  security  condition. 
W.l.o.g,  we  assume  that  all  the  entropies  on  keys  are  equal, 
and  we  denote  this  common  entropy  by  H( K). 

III.  Our  Results 

Theorem  III.l  In  a  ( k,n,C)-DKDS ,  for  each  Ch  £  C,  and 
for  i  =  1, . . . ,  n  and  j  £  U,  it  holds  that  H( Yfj)  >  H( K). 

Theorem  III. 2  Let  Ai, . . . ,  A„  be  the  private  information  of 

51. .  . .  ,Sn-  Then,  H( A,)  >  |C|i7(K),  for  each  i  =  1, . . .  ,n. 

Theorem  III. 3  Let  Ai, . . .  ,An  be  the  private  information  of 

51.. ...5n.  Then,  H(Au...,An)  >  k\C\H(K). 

All  the  previous  bounds  are  tight.  Indeed,  using  multiple 
copies  of  the  Shamir’s  Secret  Sharing  Scheme  [2],  we  can  con¬ 
struct  a  protocol  that  meets  the  bounds.  Moreover,  also  the 
scheme  described  in  [1]  is  optimal  with  respect  to  the  infor¬ 
mation  distributed. 

IV.  Open  Problems 

Further  researches  can  be  done  to  model:  DKDSs  with  an  ini¬ 
tialization  performed  by  a  subset  of  Si, . . . ,  S„;  DKDSs  secure 
against  coalitions  of  users  fixed  in  size;  DKDSs  in  which  user’s 
key-recovering  is  not  based  on  a  threshold  structure  but  on  a 
generic  access  structure  on  S\, . . .  ,Sn. 
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Abstract  —  A  novel  adaptive  Bayesian  receiver  for 
signal  detection  in  flat-fading  channels  is  developed 
based  on  the  sequential  Monte  Carlo  methodology. 
The  basic  idea  is  to  treat  the  transmitted  signals 
as  missing  data  and  to  sequentially  impute  multi¬ 
ple  copies  of  them  based  on  the  observed  signals. 
The  imputed  signal  sequences,  together  with  their 
importance  weights,  provide  a  way  to  approximate 
the  Bayesian  estimate  of  the  transmitted  signals  and 
the  channel  states.  It  is  shown  through  simulations 
that  the  proposed  sequential  Monte  Carlo  receivers 
achieve  near-bound  performance  in  fading  channels 
without  the  aid  of  any  training/pilot  symbols  or 
decision  feedback.  Moreover,  the  proposed  receiver 
structure  exhibits  massive  parallelism  and  is  ideally 
suited  for  high-speed  parallel  implementation  using 
the  VLSI  systolic  array  technology. 

I.  System  Description 

We  consider  a  communication  system  signaling  through 
a  flat-fading  channel  with  additive  ambient  noise.  The 
transmitted  complex  data  symbol  St  takes  values  from  a 
finite  alphabet  set  A  =  {ai ,  • ,  0|^|}.  The  input-output 

relationship  of  the  flat-fading  channel  is  described  by 

yt=atst  +  nt,  t  =  0, 1,  •  •  • ,  (1) 

where  y>,  a t,  st  and  nt  are  the  received  signal,  the  fading 
channel  coefficient,  the  transmitted  symbol,  and  the  ambient 
additive  noise  at  time  t,  respectively.  The  processes  {at}, 
{ } ,  and  {nt}  are  assumed  to  be  mutually  independent.  It 
is  assumed  that  the  additive  noise  {nt}  is  a  sequence  of 
independent  and  identically  distributed  (i.i.d.)  zero-mean 
complex  Gaussian  random  variables:  n t  ~  A7(0,  <t2  ).  It  is 
further  assumed  that  the  channel-fading  process  is  Rayleigh. 
That  is,  the  fading  coefficients  {at}  form  a  complex  Gaussian 
process  that  can  be  modeled  by  the  output  of  a  lowpass 
Butterworth  filter  driven  by  white  Gaussian  noise.  This  fading 
channel  can  be  described  by  the  following  state-space  model 

xt  =  Fxt-i+gut,  (2) 

yt  =  sthHxt  +  <rvt,  (3) 

where  { vt}  in  (3)  is  a  white  complex  Gaussian  noise  sequence 
with  unit  variance  and  independent  real  and  imaginary  com¬ 
ponents. 
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supported  in  part  by  the  NSF  grant  CAREER  CCR-9875314.  J.S. 
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II.  The  Mixture  Kalman  Filter  Receiver 
Denote  Yt  =  (j/o ,  -  •  ■ ,  3/t)  and  St  =  (so,--T«t).  Assume 
that  the  transmitted  symbols  are  independent  and  identically 
distributed  uniformly  a  priori.  We  are  interested  in  estimating 
the  symbol  St  and  the  channel  state  at  =  hHXt  at  time  t 
based  on  the  observation  Yt.  Note  that  with  a  given  St,  the 
state-space  model  (2)-(3)  becomes  a  linear  Gaussian  system. 
Hence, 

p(xt  |  St,  Yt)  ~  A/' (/it(S,),  E,(S,)) ,  (4) 

where  the  mean  ftt(St)  and  covariance  matrix  Et (St)  can 
be  obtained  by  a  Kalman  filter  with  the  given  St.  The 
adaptive  receiver  proposed  in  this  paper  is  based  on  a  recently 
proposed  filtering  method,  the  mixture  Kalman  filter  (MKF). 
The  basic  idea  is  to  obtain  a  set  of  Monte  Carlo  samples  of 
the  transmitted  symbols,  {(S£J\  ,  properly  weighted 

with  respect  to  the  distribution  p(St|Vt).  Then  for  any 
integrable  function  h(xt,  st),  we  can  approximate  the  quantity 
of  interest  E{h(xt,  st)|Vt}  as  follows: 


.  E  {h(xt,  st)  |  Yt) 


where  Wt  =  ^2JLiwt^’  an<i  <£(•;  /*',£)  denotes  a  complex 
Gaussian  density  function  with  mean  p  and  covariance  matrix 
E.  In  particular,  a  posteriori  symbol  probability  can  be 
estimated  as 

1  m 

P(s,  =  Qj)  S  —  ^  l(s(£j)  =  a.)  w(tj), 

‘  j=i 


where  l(-)  is  an  indicator  function.  Denote  ==  fit(S[3^), 

=  Et(S^),  and  By  exploiting  the 

Markovian  nature  of  the  state-space  model  (2)-(3),  we  can 
derive  a  recursive  procedure  for  generating  a  set  of  properly 
weighted  Monte  Carlo  samples  (i.e.,  {(5,^,  )}yLi) 

at  time  t,  with  respect  to  p(St|Yt),  from  a  set  of  properly 
weighted  Monte  Carlo  samples  at  time  (t  —  1),  which  leads 
to  an  adaptive  receiver  structure  in  fading  channels  based  on 
Monte  Carlo  filtering.  Moreover,  if  the  transmitted  symbols 
are  convolutionally  coded,  then  a  similar  Monte-Carlo-based 
adaptive  receiver  can  be  developed  that  directly  samples  the 
information  bits  based  on  the  received  signal.  Simulation 
results  indicate  that  a  sample  size  of  50  suffices  to  obtain  good 
receiver  performance. 
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Abstract  —  An  optimal  multiuser  detector,  in  the 
weighted  least  squares  (WLS)  sense,  is  derived  for 
Code  Division  Multiple  Access  (CDMA)  and  Space 
Division  Multiple  Access  (SDMA)  systems. 

I.  Introduction 

Optimal  detectors,  e.g.,  the  maximum  likelihood  detector 
with  a  bank  of  matched  filters  (MF-ML)  [1],  require  knowl¬ 
edge  of  many  parameters  such  as  the  number  of  users,  their 
signature  sequences,  and  transmission  delays.  In  this  paper, 
we  present  an  optimal  WLS  detector  that  can  be  implemented 
adaptively  without  the  knowledge  of  these  parameters.  The 
WLS  detector  includes  the  MF-ML  detector  as  a  special  case 
and  it  also  optimally  suppresses  narrow  band  interference. 

II.  WLS  Structure 

Let  y(t)  be  a  received  CDMA  or  SDMA  signal  due  to  Kt 
users.  Consider  the  fractionally  chip  spaced  received  vector, 
y  =  Sb  +  n,  where  y  =  [yo, •  •  •  ,J/w,-i]T,  Nt  is  the  total 
number  of  received  samples,  yi  =  y(lTcr),  Tcr  =  Tc/r,  Tc 
is  the  chip  period,  and  r  is  chosen  to  satisfy  the  Nyquist 
sampling  criterion.  The  matrix  S  equals  [si,o,  •  •  ■ ,  skTin6-i], 
where  s k,i  =  Ef&^y]  is  the  signature  vector  for  the  i-th  bit  of 
the  fc-th  user,  bk,i,  and  Nt,  is  the  total  number  of  bits  trans¬ 
mitted  by  each  user.  S  also  includes  the  effects  of  multipath 
channel.  The  vector  b  equals  [6i,o,  •  •  • , i>KT,/vb-i]T.  The  co- 
variance  matrix  of  the  noise  vector  n,  which  may  also  contain 
narrow  band  interference,  is  R„.  Consider  the  problem  of 
joint  detection  of  K  users,  where  K  =  Kt-Kj,  Ki  being  the 
number  of  unknown  CDMA/SDMA  interferes.  We  rewrite 
Sb  =  Sobo  +  S[/bt/,  where  bo  contains  symbols  of  the  K 
users  and  Sd  contains  the  corresponding  signature  vectors. 
Similarly,  S u  and  b jj  correspond  to  the  Kj  users. 
Proposition  1  The  output  samples  of  a  bank  of  KNt,  min¬ 
imum  mean  squared  error  (MMSE)  filters  corresponding  to 
each  symbol  of  each  of  the  K  users  contain  a  set  of  sufficient 
statistics  for  WLS  detection. 

The  proof  follows  from  the  WLS  detection  criterion, 
bo  =  argmin(y  -  Sobo)HRo1(y  -  Sobo) 

where  Ri/  =  SySy  +Rn-  The  MMSE  filter  p*,<  correspond¬ 
ing  to  bk,i  is  of  the  form  pt|j  =  Rp'sy. 

Corollary  1.1  If  the  interfering  users’  symbols  are  Gaussian 
distributed,  and  n  is  Gaussian,  then  the  MMSE  filter  output 
samples  are  also  a  set  of  sufficient  statistics  for  ML  detection. 
Corollary  1.2  Only  a  bank  of  K  +  i(A*  +  (Aj>  —  A^')) 
MMSE  filters  is  required  for  generating  sufficient  statistics,  if 
the  MMSE  filter  corresponding  to  bk,i  is  of  the  form  p = 
[Oi,{,P*  >0i,ij]t  for  N'k  <  i  <  N'k,  where  N'k  and  Nk  are  in¬ 
tegers,  £  =  lo  +  (*  -  1  )rN,  rj  =  Nt  —  ( rNf  +  fo  +  (»  -  1  )rN) 

^art  of  the  work  was  done  at  RSISE,  Australian  National  Uni¬ 
versity,  Canberra,  Australia. 
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for  some  integer  lo,  0m,n  is  an  m  x  n  zero  matrix,  and  p*  is 
a  vector  of  length  rNf.  In  practice,  only  K  filters  are  needed. 
Each  user  repeats  its  signature  sequence  for  consecutively 
transmitted  symbols.  Therefore,  instead  of  considering  the 
whole  received  vector,  a  sliding  windowed  received  signal  vec¬ 
tor  y(»)  of  length  rNf  samples  may  be  considered  so  that 
y(i)  =  S(t)b(»)  +  n(»),  where  the  argument  i  implies  that 
it  corresponds  to  the  t-th  symbol  of  all  users.  This  sliding 
window  moves  at  steps  of  rN  samples,  where  N  is  the  spread¬ 
ing  gain.  Then  the  WLS  tnetric,  to  be  minimized,  becomes 
A(bo)  =  Er=iE,^o~1[-2ReK..Pkfy(*)]  +  K.i^dd.kbddii)], 

where  qdd,k  represents  interference  contributions  to  y(i),  in 
MMSE  sense,  from  symbols  of  the  K  users  except  bk,i  and  the 
vector  bdd(i)  contains  these  symbols.  Re[  ]  denotes  real  part. 

III.  Adaptive  Implementation 

Consider  a  centralized  decision  feedback  detector,  with  feed¬ 
forward  filter(FFF)  tap  vector  w dfe.k,  and  feedback  filter 
(FBF)vector  d dfe,k  for  the  fc-th  user.  We  write  S(f)b(i)  = 
Sud(t)bud(t)  +  Sdd(i)bdd(i),  where  S dd{i)  contains  windowed 
signature  vectors  corresponding  to  symbols  in  bdd(*),  and  the 
remaining  signature  vectors  and  symbols  are  contained  in  Sud 
and  hud  respectively. 

Proposition  2  The  MMSE  solution  for  the  FFF  and  FBF 
taps  is  [2J:  Wdfc.k  =  F“‘sM(t),  d dfe,k  =  Sdd(i)^dfc.k,  where 
F„  =  Sud(t)S"d(t)  +  R;,  R;  =  E[n(t)nH (i)]. 

This  solution  can  be  adaptively  obtained  using  training  sym¬ 
bols  from  the  K  users.  The  FFF/FBFs  are  normalized 
with  respect  to  one  of  the  users,  say  user  1.  Define  fik  = 
i(k,i)/ddfe,k(l,i)  for  1  <  k  <  K,  where  ddfc,k{m,i)  de¬ 
notes  the  effect  due  to  the  m-th  user’s  »-th  bit.  ‘Seeded’  FFFs 
emd  FBFs  are  defined  for  1  <  k  <  K  as  w dft.k  =  fik'Ndfc,k 
and  dd/e.t  =  fikddfe.k-  The  WLS  metric  becomes  A(bD)  = 

£f=i  E,ilV1[-2Re[6*..^e,*y(0]  +  K,iddfc,k*>dd(i)]. 

ignoring  the  seeding  factor  f}k,  when  users  have  different 
power  levels,  results  in  performance  loss  [3].  The  WLS  data 
detection  is  now  performed  by  using  the  Viterbi  algorithm  [3]. 

IV.  Conclusions 

A  more  general  optimal  detector,  compared  to  the  ML  detec¬ 
tor,  is  presented  and  its  adaptive  version  is  realized. 
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Abstract  —  We  consider  the  problem  of  simulta¬ 
neous  parameter  estimation  and  data  restoration  in  a 
synchronous  CDMA  system.  Bayesian  inference  of  all 
unknown  quantities  is  made  from  the  superimposed 
and  noisy  received  signals.  The  Gibbs  sampler,  a 
Markov  Chain  Monte  Carlo  procedure,  is  employed 
to  calculate  the  Bayesian  estimates.  The  basic  idea 
is  to  generate  ergodic  random  samples  from  the 
joint  posterior  distribution  of  all  unknowns,  and  then 
to  average  the  appropriate  samples  to  obtain  the 
estimates  of  the  unknown  quantities.  Being  “soft- 
input  soft-output”  in  nature,  this  technique  is  well 
suited  for  iterative  processing  in  a  coded  system, 
which  allows  the  adaptive  Bayesian  multiuser  detector 
to  refine  its  processing  based  on  the  information 
from  the  decoding  stage,  and  vice  versa  -  a  receiver 
structure  termed  as  adaptive  Turbo  multiuser  detector. 

I.  System  Description 

We  consider  a  synchronous  CDMA  system  with  K  users, 
employing  normalized  modulation  waveforms  si,  82,  •  •  • ,  8k, 
and  signaling  through  a  channel  with  additive  white  Gaussian 
noise.  The  received  signal  is  given  by 

K 

r(i )  =  '^2  A*  **(>’)  3k  +  n(i),  i  =  0,  •  •  • ,  M  -  1.  (1) 

k= 1 

In  (1),  M  is  the  number  of  data  symbols  per  user  per  frame; 
Ak,  Xk(i)  and  Sk  denote  respectively  the  amplitude,  the  *-th 
symbol  and  the  normalized  spreading  waveform  of  the  fc-th 
user;  n(i)  =  [no(ij  ni(i)  np_i(i)]T.is  a  zero-mean  white 
Gaussian  noise  vector,  i.e.,  rij(i)  ~  A/"(0,  a2),  where  a2  is  the 
variance  of  the  noise.  Define  the  following  a  priori  symbol 
probabilities 

pk(i)  =  />[xfc(»)  =  +1],  i  =  0,  ■  •  • ,  M  —  1;  A:  =  1,  ■■  ■  ,K. 

Note  that  when  no  prior  information  is  available,  then  Pk(i)  = 
1/2,  i.e.,  all  symbols  are  equally  likely. 

Denote  Y  =  {r(0),  r(l),  •  ■  • ,  r(M  —  1)}.  We  consider  the 
problem  of  estimating  the  a  posteriori  probabilities  of  the 
transmitted  symbols 

J*  [**(*)  =  +l  |  V],  t  =  0,  •  •  • ,  Af  —  1;  k=l,---,K, 

based  on  the  received  signals  Y  and  the  prior  informa¬ 
tion  without  knowing  the  channel  amplitudes 

{A*}fc=1  and  the  noise  variance  <j2 . 
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R.  Chen  was  supported  in  part  by  the  U.S.  National  Science 
Foundation  under  grant  DMS  9626113  and  grant  DMS- 9982846. 
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II.  The  Gibbs  Multiuser  Detector 

We  choose  the  following  conjugate  prior  distributions  for 
the  unknown  parameters  p(a),  p  (cr2)  and  p(X).  For  the 
unknown  amplitude  vector  a,  a  truncated  Gaussian  prior 
distribution  is  assumed, 

p(a)  oc  Af  (a0,  So)  I{a>o}- 

For  the  noise  variance  a2,  an  inverse  chi-square  prior  distri¬ 
bution  is  assumed, 

p(cr2)  ~  x-2("o,  A0). 

Finally,  the  prior  distribution  p(X )  can  be  expressed  as 
M— 1  K 

p(x)  = 

i=0  fc=l 

where  Ski  is  the  indicator  such  that  .Ski  =  1  if  ?k(i)  =  +1  and 
Ski  =  0  if  Xfc(i)  =  -1. 

The  Gibbs  sampling  implementation  of  the  adaptive 
Bayesian  multiuser  detector  in  Gaussian  noise  proceeds  iter¬ 
atively  as  follows.  Given  the  initial  values  of  the  unknown 
quantities  {a^°\  a2^°\  X^}  drawn  from  the  above  prior 
distributions,  and  for  n  =  1, 2,  •  •  • 

1.  Draw  a(n>  from  p(a  |  <r2(n_1), X(n_1),  Y). 

2.  Draw  <r2(n)  from  p{a 2  |  af-n\ X(n_1>,  Y). 

3.  For  i  =  0,  1,  •  •  • ,  M  —  1 

For  k  =  1,  2,  •  •  • ,  K 

Draw  from  P[xk(i)  |  a(n),  <r2(n),  V], 

whereX'"^  {*(0)("),  •••,  -  1)<”\  *,(,■)<»>,  •••  **_,(*■)<»>, 

^+i(«)(n_1),  •••,  *A'(0(n_1),  *(*■+ 1)^-1),  •••,  x(M-  l)*"-1’}. 
The  conditional  distributions  in  the  above  algorithm  can  be 
found  in  closed  forms. 

To  ensure  convergence,  the  above  procedure  is  usually 
carried  out  for  (no  +  N)  iterations  and  samples  from  the  last 
N  iterations  are  used  to  calculate  the  Bayesian  estimates  of 
the  unknown  quantities.  In  particular,  the  a  posteriori  symbol 
probabilities  in  are  approximated  as 

n0  + AT 

P[xk{i)  =  +1  I  Y]  S  ±  J2  W- 

n=n0  +  l 

The  above  Bayesian  multiuser  detector  can  incorporate  the 
a  priori  symbol  probabilities,  and  it  produces  as  output  the 
a  posteriori  symbol  probabilities.  Hence  it  is  very  well  suited 
for  iterative  processing  in  a  coded  system,  which  allows  the 
adaptive  Bayesian  multiuser  detector  to  refine  its  processing 
based  on  the  information  from  the  decoding  stage,  and  vice 
versa  -  a  receiver  structure  termed  as  adaptive  Turbo  multiuser 
detector. 
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Abstract  —  Multi-user  detection  of  CDMA  signals 
is  studied  in  the  light  of  iterative  processing.  The 
complete  factor  graph  of  a  coded  CDMA  system  is 
used  to  develop  several  successively  less  complex  joint 
detection  algorithms  whose  performance  is  related  to 
the  computational  complexity  of  the  algorithm. 

Introduction 

We  study  the  general  structure  of  coded  CDMA  systems 
from  an  iterative  processing  point  of  view,  illuminating  how 
the  different  parts,  in  particular  the  FEC  codec  and  the 
CDMA  receiver  have  to  interact  with  each  other.  We  apply 
a  series  of  simplifications  to  the  basic  (graphical)  structure  of 
the  receiver  which  result  in  simpler  algorithms  with  reduced 
performance.  In  the  course  of  these  simplifications  we  red¬ 
erive  a  number  of  previously  proposed  receivers,  such  as  the 
iterative  receivers  and  linear  metric  generation  receivers.  We 
show  that  if  the  receiver  for  a  given  user  does  not  know  the 
code  (FEC)  of  the  other  users,  its  code  network  breaks  into 
subnetworks,  specifically,  into  a  FEC  decoder  network,  and  a 
number  of  metric  generation  networks. 

The  optimal  metric  generator  can  easily  be  formulated,  but 
is  in  general  too  complex  for  most  practical  considerations. 
Hence,  we  simplify  this  metric  generation,  which  leads  to  a 
family  of  low-complexity  interference  cancellers.  In  particu¬ 
lar  linear  metric  generation  can  be  performed  at  the  cost  of 
further  loss  in  performance.  There  exist  efficient  methods  for 
generating  these  metrics  iteratively  [2]. 


le  Network 
ser  1 


Fig.  1:  Factor  Graph  of  a  complete  coded  CDMA  system. 
'Supported  in  part  by  NSF  Grant  No.  CCR-9732962. 


For  illustration,  assume  that  the  encoders  are  (rate  R  = 
1/2)  convolutional  encoders.  Using  the  factor  graph  repre¬ 
sentation  for  a  convolutional  code  [1],  a  factor  graph  for  the 
complete  decoder  can  be  drawn  and  is  shown  in  Figure  1: 
(Note  that  there  are  other  ways  of  drawing  the  code  network 
graph,  in  particular,  the  multiple-access  node  can  be  expanded 
into  a  complete  trellis  diagram  describing  the  multiple  access 
interference  between  the  symbols  d). 

We  refer  to  detection  in  an  interference  limited  environment 
as  interference  cancellation  whenever  full  joint  detection  is  not 
possible.  That  is,  we  assume  that  the  receiver  for  the  target 
user,  say  user  K,  has  no  knowledge  about  the  FEC  system  of 
the  interfering  users  and  can  therefore  not  decode  their  data 
streams.  Since  knowledge  of  the  code  of  the  interfering  users 
is  not  available,  the  network  structure  of  their  coding  system 
is  also  unknown,  and  we  have  to  truncate  the  receiver  network 
at  the  transmitted  symbol  nodes  of  the  interferes,  using  the  a 
priori  distributions  for  them.  The  resulting  network  structure 
is  shown  below: 


work 


Fig.  2:  Factor  Graph  of  the  coded  CDMA  system  without  knowl¬ 
edge  of  the  FEC  networks  of  the  interfering  users. 

While  the  algorithm  now  no  longer  has  to  visit  the  FEC 
code  networks  for  the  K  —  1  (interfering)  users,  the  problem 
with  the  large  multiple-access  node  arc  incidence  persists. 
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Abstract  —  We  show  that  the  probability  of  un¬ 
detected  error  for  a  quantum  code  on  the  depolariz¬ 
ing  channel  can  be  expressed  via  code’s  weight  enu¬ 
merators.  We  prove  that  there  exist  quantum  codes 
whose  probability  of  undetected  error  falls  exponen¬ 
tially  with  the  length  of  the  code  and  derive  a  lower 
bound  on  this  exponent.  To  derive  upper  bounds  yve 
formulate  a  linear  programming  problem  and  present 
two  feasible  programs  for  it.  The  asymptotic  upper 
and  lower  bounds  coincide  in  a  certain  interval  of  code 
rates  close  to  1. 


I.  Introduction 

A  quantum  code  Q((n,K))  is  a  if— dimensional  linear  sub¬ 
space  of  the  space  H  =  C2  [2].  The  number  R  =  (log2  K)/n 
is  called  the  rate  of  Q.  During  transmission  over  the  channel 
a  quantum  state  v  €  H  can  be  altered  by  an  error  operator 

E  =  ri  ®  t2  0  . . .  <g>  r„,  (1) 

where  r<  g  {HI 2,  ±i<Jx ,  db.iaz ,  ±iay},  and  ox,oy,  az  axe  the 
Pauli  matrices.  Under  the  action  of  E  the  “tramsmitted”  state 
is  transformed  to  Ev.  The  number  of  nonidentity  matrices  in 
(1)  is  called  the  weight  of  of  error,  wt(U).  By  the  definition 
of  the  channel,  the  probability  of  an  error  operator  E  equals 

(p/3)wt(B)(i_p)n-wt(£:) 

Let  P  be  the  orthogonal  projection  on  the  code  Q.  “Weight 
enumerators”  associated  with  Q  have  the  form  B(x,y)  = 
^2,BiXn~'y'  and  B±(x,y)  =  Y^Bf-xn~lyl,  where  [3] 

Bi  =  ^  Tr 2(EP)  and  Bt  =  ^  Tr (PE PE). 

Wt(£)=i  wt  (S. ) —i 

II.  Error  Detection 

It  is  possible  to  -define  error  detection  for  quantum  codes  in 
several  ways.  If  the  measurement  of  the  received  state  pro¬ 
duces  a  vector  in  the  subspace  orthogonal  to  Q,  this  indicates 
a  detectable  error.  If  this  measurement  gives  a  vector  in  Q, 
the  error  is  not  detected.  However,  in  a  general  situation,  we 
assume  that  if  the  received  state  is  very  close  to  the  transmit¬ 
ted  state,  no  error  has  occurred. 

Calculating  the  probability  PUe(Q,p)  of  undetected  error 
under  these  assumptions,  we  obtain 

Theorem  1 


Pue(Q,P)  =  Y;(Bt  -  £,)  (f‘)  (1  -P)^. 

i=0 

We  also  consider  some  other  possible  definitions  of  undetected 
error  that  arise  under  natural  physical  assumptions.  In  all  the 
cases  the  expressions  obtained  are  the  same  as  in  the  theorem 
(up  to  a  constant  factor). 


To  describe  the  behavior  of  the  probability  Pue  for  best 
possible  codes,  we  define  the  exponent 

E(R,p)  =  limsup(— l/nlog2  Pue  ( n,R,p )). 

71-400 

where  Pue  (n,  R,  p)  is  the  minimal  attainable  probability  for 
codes  of  rate  R. 


III.  Lower  Bound 

Let  T4(x,  y)  =  x  log4  3  -  x  log4  y  -  (1  -  x)  log4(l  -  y)  and 
H4(x)  =  T4(x,  x).  Let  8(R)  =  H^1  ((1  +  R)/2). 

Lemma  1  There  exists  a  sequence  of  stabilizer  codes  of  rate 
R  such  that  Bi  =  0  for  1  <  i  <  n5{R)  and  Bf  < 

n(n)3i2*-n  fQr  nd(R)  <  i  <  n. 

Computing  PUe{Q,p)  for  a  sequence  of  codes  Q  from  this 
lemma,  we  obtain  the  following  lower  bound  on  E{R,p). 


Theorem  2 


r  T4(ff4-1((i-/i)/2),p), 


E(R,p)  >  { 


0  <  i?  <  2(1  —  Hi{p))  —  1, 
otherwise. 


IV.  Upper  Bounds 

Theorem  3  Let  Z{x)  =  ZiKi{n,  4,  x)  be  a  polynomial 

expanded  in  the  basis  of  the  Krawtchouk  polynomials.  Suppose 
that 

z(*)<(p/3r(i -*>)"-< 

and 

(1/2)(1  +  R)nzi  -  Z(i)  >0  (1  <  i  <  n). 

Then  Pue(R,  n)  >  2o(l/2  -I-  R/2)n  —  Z( 0). 

By  an  appropriate  choice  of  polynomials  we  derive  two  asymp¬ 
totic  upper  bounds  on  E(R,p).  We  cite  the  first  one.  Denote 
by  Rip(6)  the  upper  bound  [1]  on  the  asymptotic  rate  of  qua¬ 
ternary  codes  with  distance  S  and  by  8ip(R)  its  inverse  func¬ 
tion. 


Theorem  4 


E(R,p)  <  { 


T*  "  H4(81pC-¥))  +  T4(8lp(i±&),p), 


R  <  2 Rlp(p)  -  1 


.  otherwise. 


The  second  bound  derived  in  the  work  improves  this  theorem 
for  medium  code  rates.  The  journal  version  of  this  talk  is 
published  in  IEEE  Trans.  Inform.  Theory,  May  2000. 


References 

[1]  M.  Aaltonen,  “Linear  programming  bounds  for  tree  codes,” 
IEEE  Trans.  Inform.  Theory,  vol.  25,  no.  1,  pp.  85-90,  1979. 

[2]  A.R.  Calderbank,  E.M.  Rains,  P.W.  Shor  and  N.J.A.  Sloane, 
“Quantum  error  correction  and  orthogonal  geometry,”  Phys. 
Rev.  Lett.,  vol.  78,  pp.  405-409,  1997. 

[3]  P.W.  Shor  and  R.  Laflamme,  “Quantum  analog  of  the 
MacWilliams  identities  in  classical  coding  theory,”  Phys.  Rev. 
Lett.,  vol.  78,  pp.  1600-1602,  1997. 


0-7803-5857-0/00/S1 0.00  ©2000  IEEE. 


275 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Multiuser  detection  in  a  quantum  channel 

Julio  Concha1  and  H.  V.  Poor1 
Department  of  Electrical  Engineering 
Princeton  University 
Princeton,  NJ  08544,  USA 
e-mail:  { j  iconcha , poor}0princeton .  edu 


I.  Introduction 

The  use  of  multiuser  detection  techniques  in  multiple  access 
optical  channels  has  been  studied  in  the  literature,  with  em¬ 
phasis  on  optical  CDMA  [2].  In  general,  it  has  been  assumed 
that  the  optical  detection  is  carried  out  by  PIN  diodes,  which 
count  the  photons  present  in  the  field.  However,  the  theory 
of  quantum  detection  [1]  indicates  that  other  measurements 
might  yield  significantly  smaller  error  probabilities. 

In  this  paper,  we  show  by  example  that  this  can  happen 
in  the  multiple-access  case.  We  also  note  that  the  quantum 
multiuser  detection  problem  differs  from  the  classical  one,  in 
that  quantum  measurement  precludes  the  use  of  matched  filter 
banks  for  non-orthogonal  signals. 


II.  Channel  model 

We  assume  that  K  users  transmit  information  via  the  elec¬ 
tromagnetic  field  with  OOK  modulation.  Specifically,  user  k 
sends  a  bandpass  signal  Sk{t)  to  indicate  a  “1”  and  no  signal 
to  indicate  a  “0”  .  Sk  can  be  conveniently  described  by  its  low- 
pass  equivalent  Sk,  where  s*(f)  =  Re  S*(< )  e'2'r^ct  and  fc  is 
the  carrier  frequency.  We  will  take  these  signals  to  represent 
the  electric  field  in  a  quasi-monochromatic  linearly  polarized 
coherent  light  beam. 

Since  we  want  to  study  the  effect  of  multiple-access  inter¬ 
ference,  we  ignore  the  effects  of  noise  and  assume  that  the 
different  users  transmit  synchronously.  Thus,  the  detector  re¬ 
ceives  the  linear  superposition  of  the  Sk ’s,  which  excite  various 
modes  of  the  detector  aperture  field.  The  resulting  quantum 
state,  described  by  a  density  operator  p,  contains  the  trans¬ 
mitted  information.  The  receiver  then  has  to  decide  which 
one  of  the  several  possible  p's  is  present,  using  a  probability 
operator-valued  measure  (POVM).  This  is  a  collection  of  Her- 
mitian  positive-definite  operators  IR,  which  must  be  chosen 
so  that  the  probability  of  detection  error  is  minimized. 

In  the  single-user  case,  we  can  consider  a  “matched  filter” 
detector,  such  that  5i(t)  coincides  with  one  of  the  temporal 
modes  of  the  field.  Then  the  hypotheses  to  be  tested  are 
\ip)  =  |a)  vs.  \ip)  —  |0),  where  |a)  is  a  coherent  state  and 
\ip)  {ip |  is  the  received  density  operator.  It  is  shown  in  [1] 
that  in  this  case  an  optimally-designed  POVM  achieves  a  lower 
probability  of  error  than  a  detector  based  on  photon  counting. 

III.  Multiuser  detection 

Now  consider  the  case  K  =  2.  If  Si  and  S2  are  orthog¬ 
onal,  they  can  again  be  aligned  with  two  temporal  modes  of 
the  field,  so  that  the  hypotheses  to  test  axe  \ip)  =  |0)  |0)  vs. 

| ip)  =  |q)  |0)  vs:  \ip)  =  |0)  |a)  vs.  IV*)  =  |a)  |a).  We  as¬ 
sume  that  both  users  transmit  the  same  average  number  of 

xThis  work  was  supported  by  the  National  Science  Foundation 
under  Grant  CCR-99-80590. 


Figure  1:  Probability  of  symbol  error  for  the  ML  photon 
counter  and  the  optimal  quantum  detector. 

photons,  N{—  a2.)  It  can  be  shown  that  in  this  case  the  op¬ 
timal  quantum  detector  is  equivalent  to  two  matched  filters 
acting  independently  on  each  mode.  In  general  this  is  true  if 
the  received  density  operator  when  user  1  sends  symbol  i  and 
user  2  sends  symbol  j  is  of  the  form  p,  j  =  p,  ®  p} . 

If  Si  and  S2  are  not  orthogonal,  we  can  no  longer  assign 
separate  modes  to  the  different  users,  so  that  independent 
matched  filtering  is  not  only  not  optimal,  but  actually  not 
possible.  As  an  alternative  we  can  take 

Si  =  S1/HS1II  (1) 

92  oc  S2/IIS2II  -  r9u  (2) 

where  r  is  the  correlation  coefficient  between  S 1  and  S2. 
Hence,  the  4  hypotheses  are  | ip)  =  |0)  |0)  vs.  | ip)  = 

\Pi)  W2)  vs.  | ip)  =  |a)  |0>  vs.  | ip)  =  |i/q2  +/3^  |/32), 
where  r  =  0i  /a  and  the  two  user  powers  are  equal,  i.e. 
N  —  a2  =  Pi  +  P2  ■ 

Fig.  1  shows  the  probability  of  (symbol)  error  of  the  op¬ 
timal  quantum  detector  as  the  correlation  coefficient  varies 
between  0  and  1,  for  N  =  10  and  N  =  30.  The  dashed 
line  corresponds  to  a  maximum-likelihood  multiuser  detector 
based  on  the  number  of  photon  counts  in  each  of  the  modes. 
It  can  be  seen  that  the  optimal  quantum  detector  outperforms 
the  ML  photon  counter  by  3  orders  of  magnitude  for  r  «  0.3. 
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Abstract  —  The  aim.  of  this  paper  is  explicit  calcu¬ 
lation  of  the  classical  capacity  of  quantum  Gaussian 
channels,  in  particular,  of  those  using  squeezed  states. 
The  calculation  is  based  on  a  general  formula  for  the 
entropy  of  quantum  Gaussian  state,  which  is  of  inde¬ 
pendent  interest,  and  on  the  recently  proved  coding 
theorem  for  quantum  communication  channels. 

I.  Introduction 

One  of  the  recent  achievements  of  the  quantum  information 
theory  is  the  direct  coding  theorem  for  transmission  of  clas¬ 
sical  information  through  quantum  communication  channels, 
which  provides  an  explicit  formula  for  the  capacity  of  the  chan¬ 
nel  as  supremum  of  the  quantum  entropy  bound  with  respect 
to  input  probability  distributions.  This  result  was  extended 
to  channels  with  constrained  inputs  [2]  among  which  channels 
with  additive  quantum  Gaussian  noise  and  the  constrained 
power  of  the  signal  are  most  important  for  applications,  be¬ 
cause  the  class  of  quantum  Gaussian  states  includes  coherent 
find  squeezed  states,  together  with  their  thermal  mixtures.  In 
this  talk  we  present  several  results  concerning  the  capacity  of 
quantum  Gaussian  channels. 

II.  Classical  signal  plus  quantum  noise 

We  consider  quantum  system,  such  as  cavity  field  with 
finite  numbers  of  modes,  described  by  annihilation  opera¬ 
tors  ai , . . . ,  a,  satisfying  the  canonical  commutation  relation 
(CCR)  Let  'H  be  the  Hilbert  space  of  irreducible  representa¬ 
tion  of  CCR,  and  let  p(0)  be  a  density  operator  in  %  describ¬ 
ing  state  of  the  cavity  field.  Consider  the  family  of  density 
operators 

p(p)  =  V(p)p( 0)D(p)t  ;  p  =  (w)  €  C',  (1) 

where  V{p.)  is  the  displacement  operator  in  %.  In  commu¬ 
nication  theory  p(0)  describes  background  noise,  comprising 
quantum  noise,  and  p  is  the  classical  signal.  Thus  the  map¬ 
ping  p  — t  p(p)  is  classical-quantum  channel  in  the  sense  of 
[2]‘ 

According  to  [3],  the  capacity  of  such  a  channel  is  equal  to 
C  =  sup  H{pP)  -  H(p( 0)).  (2) 

p&Vi 

where  H  =  — Trp  log  p  is  the  von  Neumann  entropy,  pp  = 
/  p(p)P(dp),  and  Vi  is  a  convex  subset  of  probability  distri¬ 
butions  P(dp)  on  C‘,  satisfying  the  power  constraint 

/'^fia;J|p,|2P(dp)<£.  (3) 

J  i= i 


III.  The  capacity  of  quantum  Gaussian  channels 

Let  p(0)  be  the  Gaussian  density  operator  with  m  =  0 
and  the  correlation  matrix  a.  Let  P  be  Gaussian  probability 
distribution  with  correlation  matrix  /3  and  zero  mean.  The 
inequality  (3)  then  takes  the  form: 

Spe/3  <  E,  (4) 

where  e  is  the  diagonal  energy  matrix. 

Theorem.  The  capacity  of  the  Gaussian  channel  is  equal 
to 

C  —  max  isPgr(abs(A_1(a  +  /?))  —  7/2)  (5) 

-^sPfl(abs(A_1a)  -  1/2) 

where  g(x)  =  (x  -f  l)log(x  +  1)  —  x  log  x  and  B\  is  the  convex 
set  of  real  positive  matrices  [3,  satisfying  (4). 

Example.  In  the  case  of  squeezed  state  p(0)  in  one  mode 
let  N ,  —  E/hw,N  =  Trp(0)ota  =  (u2aqq  +  app)/2hu  be, 
correspondingly,  the  mean  photon  numbers  in  the  signal  and 
in  the  noise.  Then  if  N2  +  N  <  N2,  the  capacity  of  the 
squeezed  state  channel  is 

C'=g(N+N ,),  (6) 

otherwise 

C  =  g^N,(2N+l  +  2v^PTN)  +  \-~^J  .  (7) 

Prom  these  expressions  one  sees  that  using  squeezing  states 
under  constrained  input  energy  E  does  increase  the  capacity. 
On  the  other  hand,  with  restricted  output  energy  hu(N  +  Nt) 
one  cannot  obtain  capacity  greater  than  (6),  which  is  known 
to  be  the  absolute  maximum  under  this  constraint. 
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Abstract  —  We  study  the  problem  of  compressing 
a  block  of  symbols  (a  block  quantum  state)  emit¬ 
ted  by  a  memoryless  quantum  Bernoulli  source.  We 
present  a  simple-to-implement  quantum  algorithm  for 
projecting,  with  high  probability,  the  block  quantum 
state  onto  the  typical  subspace  spanned  by  the  lead¬ 
ing  eigenstates  of  its  density  matrix.  We  propose  a 
fixed-rate  quantum  Shannon-Fano  code  to  compress 
the  projected  block  quantum  state  using  a  per  sym¬ 
bol  code  rate  that  is  slightly  higher  than  the  von 
Neumann  entropy  limit.  Finally,  we  propose  quan¬ 
tum  arithmetic  codes  to  efficiently  implement  quah- 
tum  Shannon-Fano  codes. 

I.  Extended  Abstract 

Modem  information  theory  makes  fundamental  assump¬ 
tions  concerning  the  physical  representation  and  processing  of 
information.  Following  the  lead  of  classical  mechanics,  mod¬ 
em  information  theory  assumes  that  a  information  bit  can 
exist  in  either  one  of  two  states,  say,  0  or  1.  However,  clas¬ 
sical  physics  is  known  to  fail  spectacularly  under  many  cir¬ 
cumstances,  for  example,  when  the  objects  being  described 
are  very  small  or  have  very  large  energies.  This  regime  of 
physics  is  described  by  the  laws  of  quantum  mechanics.  Con¬ 
ventional  information  theory  fails  to  properly  describe  how 
information  can  be  represented  and  transformed  in  such  physi¬ 
cal  systems,  and  must  be  replaced  by  an  appropriate  quantum 
analog:  quantum  information  theory.  In  contrast  to  the  clas¬ 
sical  information  bit,  a  quantum  information  bit  can  exist  in 
a  superposition  of  two  orthogonal  quantum  states. 

The  problem  of  compression  is  central  to  storage  and  trans¬ 
mission  of  quantum  data.  We  investigate  quantum  algorithms 
for  compressing  a  sequence  of  symbols  emitted  by  a  memory¬ 
less  quantum  Bernoulli  source.  The  basis  for  compression  of 
classical  data  is  Shannon’s  noiseless  coding  theorem:  if  the 
per  symbol  code  rate  is  slightly  larger  than  the  Shannon  en¬ 
tropy,  then  there  exists  a  block  code  (with  sufficiently  large 
block  size)  such  that  the  compressed  message  can  be  recov¬ 
ered  with  probability  close  to  unity.  The  quantum  analogue 
to  Shannon’s  theorem  is  Schumacher’s  theorem  [2]:  if  the  per 
symbol  code  rate  is  slightly  larger  than  the  von  Neumann  en¬ 
tropy,  then  there  exists  a  block  code  (with  sufficiently  large 
block  size)  such  that  the  compressed  message  can  be  recovered 
with  average  fidelity  close  to  unity.  The  similarity  of  the  two 
theorems  makes  it  possible  to  use,  to  a  limited  extent,  clas¬ 
sical  algorithms  for  performing  quantum  data  compression. 
However,  classical  compression  codes  cannot  immediately  be 
translated  into  quantum  versions;  for  example,  in  order  to  pre¬ 
serve  the  coherent  quantum  state,  all  operations  performed  on 
the  data  must  be  reversible  and  must  not  entangle  the  state 
with  any  temporary  variables.  Furthermore,  it  is  essential  that 


the  original  state  must  be  entirely  obliterated  in  producing  the 
encoded  state,  because  quantum  states  cannot  be  cloned. 

The  statistics  underlying  a  quantum  memoryless  Bernoulli 
source  is  completely  captured  by  its  density  matrix.  The  fun¬ 
damental  idea  behind  quantum  data  compression  is  to  ana¬ 
lyze  the  eigen-structure  of  the  joint  density  matrix  associated 
with  a  block  quantum  state  emitted  by  the  quantum  memory¬ 
less  Bernoulli  source.  As  our  first  contribution,  we  present  a 
quantum-mechanical  algorithm  for  projecting  the  block  quan¬ 
tum  state  onto  the  subspace  spanned  by  the  leading  (or  typ¬ 
ical)  eigenstates  of  the  joint  density  matrix.  Our  algorithm 
computes,  in  parallel,  an  indicator  function  that  is  0  if  the 
eigenstate  is  typical  and  1  otherwise.  By  making  a  measure¬ 
ment  on  the  quantum  bit  associated  with  the  indicator  func¬ 
tion,  with  very  high  probability,  we  project  the  block  quan¬ 
tum  state  onto  the  typical  subspace  spanned  by  the  leading 
eigenstates.  Our  theoretical  results  represent  a  strengthening 
of  Schumacher’s  pioneering  result  in  that  they  hold  for  fixed 
block  sizes  and  they  deliver  a  rate  of  convergence. 

The  projection  onto  the  typical  subspace  wipes  out  the 
trailing  eigenstates,  and,  hence,  the  projected  quantum  state 
lies  in  the  low-dimensional  typical  subspace.  Consequently, 
each  leading  eigenstate  can  be  represented  using  roughly  the 
logarithm  of  the  dimension  of  the  typical  subspace.  The  cen¬ 
tral  problem  of  quantum  data  compression  is  to  efficiently 
compute  such  low-dimensional  representations.  As  our  second 
contribution,  we  propose  a  quantum  Shannon-Fano  code  to 
represent  and  compress  the  projected  block  quantum  state 
using  a  per  symbol  code  rate  that  is  slightly  higher  than  the 
von  Neumann  entropy  limit. 

As  our  third  contribution,  we  propose  quantum  arith¬ 
metic  codes  to  efficiently  implement  quantum  Shannon-Fano 
codes.  Our  arithmetic  encoder/decoder  use  a  certain  finite- 
precision  arithmetic  process  that  is  inspired  by  classical  arith¬ 
metic  coding.  The  novelty  of  quantum  arithmetic  coding 
is  to  implement  finite-precision  arithmetic  processes  in  a 
quantum-mechanically  reversible  fashion.  Our  arithmetic  en¬ 
coder/decoder  have  a  cubic  circuit  and  a  cubic  computational 
complexity  in  the  block  size.  The  proposed  encoder  and  de¬ 
coder  are  quantum- mechanical  inverses  of  each  other,  and  con¬ 
stitute  a  very  satisfying  example  of  reversible  quantum  com¬ 
putation. 
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I.  Introduction 

Run  length  constraints  derive  from  digital  storage  applica¬ 
tions  [2],  For  nonnegative  integers  d  and  k,  a  binary  sequence 
is  said  to  satisfy  a  one-dimensional  ( d ,  k)-constraint  if  every 
run  of  zeros  has  length  at  least  d  and  at  most  k  (if  two  ones 
are  adjacent  in  the  sequence  we  say  that  a  run  of  zeros  of 
length  zero  is  between  them).  A  two-dimensional  binary  pat¬ 
tern  arranged  in  an  mxn  rectangle  is  said  to  be  (di,  fci,  di,ki)- 
constrained  if  it  satisfies  a  one-dimensional  (di,  fci) -constraint 
horizontally  and  a  one- dimensional  (d2, fc2)-constraint  verti¬ 
cally.  The  two-dimensional  (di,fci ,di,ki) -capacity  is  defined 


as 

lim 

,n— >00  7717 1 


■/di,fcj,d2>*;2 


where  Arm,1n’fcl’d2’*:2'  denotes  the  number  of  mxn  rectan¬ 
gles  that  are  (di,  ki,  d2,  ^-constrained.  If  d  =  di  =  d2 
and  k  —  k\  =  ki  (this  is  called  the  symmetric  constraint) 
then  the  two-dimensional  (di,  ki,  di,  fo>)-capacity  is  called  the 
two-dimensional  ( d,k)-capacity ,  and  is  denoted  by  Cd,k ■  A 
proof  was  given  in  [3]  that  shows  the  two-dimensional  (d,  k)- 
capacities  exist,  and  essentially  the  same  proof  shows  that  the 
^h,fci,d2,k2  Gxist. 

The  two-dimensional  asymmetric  positive  capacity  region  is 
the  set 

{ (di ,  ki ,  d2 ,  ki )  .  d}* 


A  basic  question  is  to  determine  which  constraints  actually 
lie  in  the  positive  capacity  region  and  which  do  not.  For  the 
symmetric  constraints,  it  was  shown  in  [1]  that  Ci,2  =  0  and 
a  complete  characterization  of  which  (d,  k)  integer  pairs  yield 
positive  capacities  was  given  in  [3]  and  is  stated  as  the  propo¬ 
sition  below. 


Theorem  1  Let  d\,  k\,  di,  and  ki  be  nonnegative  integers 
such  that  di  <  &i  and  di  <  ki.  Let  d  =  min(di,d2),  D  = 
max(di,d2),  k  —  mm(ki,ki),  K  —  max(fci,fc2),  8  =  k  —  D, 
and  A  =  K  —  d.  Then  the  following  partially  characterizes 
the  positive  capacity  region  of  two-dimensional  run  length  con¬ 
strained  channels: 

(i)  If  8  ^  0  then  Hdx,ki;d2,k2  — 

(ii)  If  8  =  1  then 

(A)  If  d  =  0  then  Cd1,k1-,d2,k2  >  0. 

(B)  If  d  >  1  then 

(a)  If  A  <  1  then  Cd1,kud2,k2  =  0. 

(b)  If  A  >  di  =  di  then  Cdx,ki-,d2,k2  >  0. 

(c)  If  A  >  3  and  d  =  1  then  Cdx,kXid2,k2  >  0. 

(iii)  If  8  >  2  then  Cdx,kx-,d2,k2  >  0. 

The  .only  case  that  is  presently  not  completely  character¬ 
ized  in  Theorem  1  is  part  (iiB),  namely  when  8  =  I,  d  >  1, 
and  A  >  2.  If  8  =  1,  d  =  1,  and  A  =  2  then  the  only  ca¬ 
pacities  that  need  be  considered  are  61,2,1,3  and' Ci, 3, 2, 3-  But 
Ci, 2, 1,3  >  0  from  part  (ii(B)b).  Thus  if  we  were  able  to  show 
that  Ci, 3, 2, 3  >  0  then  we  could  replace  A  >  3  by  A  >  2 
in  part  (ii(B)c).  However,  computer  simulation  suggests,  but 
does  not  prove,  that  perhaps  Ci,3,2,3  =  0.  This  remains  an 
open  question. 

Also,  computer  simulations  suggest  the  plausibility  of  Con¬ 
jecture  1  below,  for  which  we  presently  do  not  have  a  proof 
either. 

Conjecture  1  Cd,d+i,d,2d  =  0  whenever  d  >  0. 


Proposition  1  Gd,k  >  0  if  and  only  if  k  —  d  >  2  or  (d,  k)  = 

(0,1). 


Conjecture  1  would  characterize  with  Theorem  l(ii(B)b)  the 
positive  capacity  region  for  k  —  d  +  1  and  di  =  d2  as: 


II.  Main  Results 

In  the  present  paper  we  determine  whether  or  not  the  two- 
dimensional  capacity  is  positive,  for  a  large  set  of  asymmetric 
constraints  (di, fci, d2, ki),  and  the  main  results  are  summa¬ 
rized  in  Theorem  1.  It  is  interesting,  to  note  that  for  the  [1] 
symmetric  constraint  (i.e.  when  di  =  d2  and  fci  =  ki),  the 
capacity  is  zero  whenever  d  and  k  are  positive  and  differ  by 
one,  whereas  for  many  asymmetric  constraints  the  capacity  is  [2] 
positive  when  the  horizontal  constraints  or  the  vertical  con¬ 
straints  differ  by  one  (e.g.  Theorem  1  part  (ii(B)b)).  However, 
in  the  asymmetric  case  if,  for  example,  fti  =  di  +  1  <  d2  then  [3] 
the  capacity  is  zero  (by  Theorem  1  part  (i)). 

‘This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  and  by  a  JSPS  Fellowship  for  Young  Scientists. 


C'd.KJd.d+i  =  C'd.d+i.d.ic  =  0  if  and  only  if  K  <  2d. 
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Abstract  —  We  show  that  a  Read/ Write  Isolated 
Channel  can  be  modelled  as  a  constrained  binary  ma¬ 
trix.  This  permits  the  use  of  constrained  matrix  tech¬ 
niques  to  bound  the  capacity  of  the  channel,  improv¬ 
ing  on  the  older  known  bounds. 

I.  Introduction 

A  serial  binary  (0, 1)  memory  is  read  isolated  if  no  two  con¬ 
secutive  positions  in  the  memory  may  both  store  l’s.  A  serial 
binary  (0, 1)  memory  that  undergoes  rewriting  is  write  isolated 
if  it  satisfies  the  restriction  that  no  two  consecutive  positions 
in  the  memory  can  be  changed  during  one  rewriting  phase. 

A  read/write  isolated  memory  (RWIM)  is  a  binary,  linearly 
ordered,  rewritable  storage  medium  that  obeys  both  the  read 
and  write  restrictions.  This  type  of  memory  was  considered  by 
Cohn  in  [3],  who  examined  its  channel  capacity.  The  set  of  all 
permissible  binary  memory  configurations  can  be  considered 
as  a  channel  alphabet.  The  rewriting  restrictions  determine 
which  characters  may  follow  which  characters  in  the  channel. 
The  channel  capacity  of  this  process  can  then  be  defined  as  fol¬ 
lows  [2]  [8]:  let  k  be  the  size  of  the  memory  in  binary  symbols, 
r  the  lifetime  of  the  memory  in  rewrite  cycles  and  N(k,r)  the 
number  of  distinct  sequences  of  r  characters.  For  fixed  k,  the 
channel  capacity,  measured  in  bits  per  rewrite,  is  defined  to 
be  [6] 

Ck  =  lim  -log 2N{k,r). 

r—too  T 

The  channel  capacity  of  the  read/write  isolated  memory,  in 
bits  per  symbol  per  rewrite,  is  then  defined  to  be 

C=  lim  \ck- 

k  — t  oo  K 

In  [3]  Cohn  established  several  expressions  for  the  capacities 
Ck  and  derived  the  following  upper  and  lower  bounds  on  C: 
0.50913  ...  <  C  <  0.56029  ....  In  this  paper  we  continue  the 
investigation  of  the  channel  capacity  and  manage  to  refine  the 
bounds  to 

0.53500...  <  C  <  0.55209.... 

We  also  provide  reasons  to  conjecture  that  C  =  0.53500  _ 

II.  Constrained  Matrices 

The  main  observation  is  that  there  is  another  way  of  viewing 
the  rewriting  process.  Suppose  k,  the  size  of  the  memory  and 
r,  the  number  of  rewrites,  are  known.  Then  we  can  define,  B, 
a  r  X  k  binary  matrix:  VI  <  i  <  k,  1  <  j  <  r, 

B(j,  i)  =  content  of  location  i  after  the  ( j  —  l)st  rewrite. 

'This  work  partially  supported  by  Hong  Kong  CERG  grants 
HKUST652/95E,  6082/97E  and  6137/98E  and  DIMACS 


Thus  the  jth  row  of  B  is  the  content  of  the  memory  after  the 
( j  -  l)st  rewrite.  Translating  the  RWIM  rules  into  matrix 
notation  shows  that  B  satisfies  the  following  two  constraints: 

1.  read  restriction:  B  does  not  contain  any  two  horizon¬ 
tally  consecutive  ones,  i.e.,  it  does  not  contain  any  1x2 
submatrix  (11); 

2.  write  restriction:  B  does  not  contain  any  2x2  submatrix 

oftheform  (j  J  )  or  (  J  ?)• 

Note  too  that  if  B  is  any  r  x  k  binary  matrix  that  obeys 
the  two  conditions  above  then  B  can  be  viewed  as  modelling 
a  memory  with  the  jth  row  of  B  being  the  content  of  the 
memory  at  time  j.  The  memory  thus  modelled  satisfies  the 
read/write  isolated  conditions.  We  have  therefore  just  seen 
that  N(k,r),  previously  defined  as  the  number  of  distinct  se¬ 
quences  of  r  characters,  is  also  the  number  of  r  x  k  binary 
matrices  that  satisfy  conditions  (1)  and  (2). 

This  observation  permits  noticing  that  C  is  not  only  the 
capacity  of  the  RWIM  channel  but  also  the  capacity  of  the 
constrained  matrices  satisfying  (1)  and  (2).  We  can  therefore 
use  transform  matrix  techniques  developed  to  study  the  ca¬ 
pacity  of  constrained  matrices,  e.g.,  in  [1][4][5][7],  to  derive 
the  better  bounds.  We  note  that  these  techniques  have  to  be 
modified  slightly  to  deal  with  the  fact  that  the  constraint  sys¬ 
tem  here  is  not  symmetric,  i.e.,  if  matrix  B  satisfies  (1)  and 
(2)  it  is  possible  that  Br  does  not  satisfy  (1)  and  (2)  (pre¬ 
viously  studied  constraints  all  seem  to  have  been  symmetric 
and  the  techniques  implicitly  used  this  symmetry). 
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I.  Introduction 

Run  length  constraints  derive  from  digital  storage  applica¬ 
tions  [3].  For  nonnegative  integers  d  and  k,  a  binary  sequence 
is  ( d ,  k)- constrained  if  there  are  at  most  k  consecutive  zeros 
and  between  every  two  ones  there  are  at  least  d  consecutive 
zeros.  An  n-dimensional  pattern  of  zeros  and  ones  arranged 
in  an  mi  x  m2  x  •  •  •  x  m„  hyper-rectangle  is  (d,  k)- constrained 
if  it  is  (1-dimensional)  (d,  fc)-constrained  in  each  of  the  n  co¬ 
ordinate  axis  directions.  The  n-dimensional  (d,k)- capacity  is 
■  defined  as 


/-»(")  _ 
°d,k  ~ 


lim 


l°g2  Arm1’,m2,...,tn 
t  — f  oo  772 1 77X2  *  *  *  772 n 


where  AIm1!?m2,...,Tn„  -denotes  the  number  of  (d,  fc)-constrained 
patterns  on  an  mi  x  m2  x  •  •  •  x  mn  hyper-rectangle.  A  sim¬ 
ple  proof  was  given  in  [5]  that  shows  the  existence  of  two- 
dimensional  (d,  fc)-capacities,  and  a  slight  modification  of  the 
proof  can  show  that  the  n-dimensional  (d,  fc)-capacities  exist. 
The  capacity  C^l  represents  the  maximum  number  of  bits  of 
information  that  can  be  stored  asymptotically  per  unit  volume 
in  n-dimensional  space  without  violating  the  (d,  k )  constraint. 

The  study  of  1-dimensional  (d,  fc)-capacities  was  originally 
motivated  by  applications  in  magnetic  storage.  Interest  in 
2-dimensional  (d,  &)-capacities  has  recently  increased  due  to 
emerging  2-dimensional  optical  recording  devices,  and  the 
multidimensional  (d,  fc)-capacities  may  play  a  role  in  future 
technologies  as  well.  A  tutorial  on  these  topics  is  given  in  [3]. 

In  general,  the  exact  values  of  the  various  n-dimensional 
(d,  fc)-capacities  are  not  known  except  in  a  few  cases  [6].  For 
example,  in  all  dimensions,  if  k  =  d  the  capacity  is  zero,  and  if 
d  =  0  the  capacity  is  positive  for  all  k  >  1.  In  one  dimension 
the  capacity  is  positive  whenever  k  >  d  >  0.  Very  tight  upper 
and  lower  bounds  on  the  (0,  Incapacity  were  given  for  two 
dimensions  in  [1],  improved  in  [2,  7],  and  extended  to  three 
dimensions  in  [7].  In  [9]  an  encoding  procedure  for  the  2- 
dimensional  (d,  oo)-constraint  was  given  for  all  positive  integer 
d’s,  and  in  [8]  an  encoding  procedure  for  the  2-dimensional 
(0,  l)-constraint  was  given  whose  coding  rating  comes  very 
close  to  the  capacity.  It  was  shown  [5]  that  whenever  k  >  d> 
1,  the  2-dimensional  capacity  is  zero  if  and  only  if  k  =  d  +  1. 


II.  Main  Results 

We  present  two  main  results  that  characterize  the  zero  ca¬ 
pacity  region  for  finite  dimensions  and  in  the  limit  of  large 
dimensions.  The  first  result  generalizes  the  zero  capacity  char¬ 
acterization  in  [5]  to  a.ll  dimensions  greater  than  one,  which 
turns  out  to  be  exactly  the  same  as  in  dimension  2.  The  sec¬ 
ond  result  gives  a  necessary  and  sufficient  condition  on  d  and 
k ,  such  that  the  capacity  approaches  zero  in  the  limit  as  the 
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dimension  n  grows  to  infinity.  These  results  are  summarized 
in  the  following  two  theorems. 

Theorem  1  For  every  n  >  2,  d  >  1,  and  k>  d, 

C^k  =  0'<=>  &  =  d  +  1. 

Theorem  2  For  every  d  >  0  and  k  >  d, 


lim  <7$  =  0  <S-  fc  <  2d. 
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Abstract  —  An  upper  bound  on  the  capacity  of  con¬ 
strained  three-dimensional  codes  is  presented.  The 
bound  for  two-dimensional  codes  of  Calkin  and  Wilf 
was  extended  to  three  dimensions  by  Nagy  and  Zeger. 
Both  bounds  apply  to  first  order  symmetric  con¬ 
straints.  The  bound  in  three  dimensions  is  generalized 
in  a  weaker  form  to  higher  order  and  non-symmetric 
constraints. 

I.  Introduction 

In  this  paper  we  consider  the  capacity  of  constrained  three- 
dimensional  (3-D)  codes  defined  by  a  set  of  constraints.  We 
consider  shift  invariant  constraints  of  finite  extent  ( N ,  M,  L), 
in  the  sense  that  the  constraints  may  be  defined  on  an  N  by 
M  by  L  volume.  Each  element  is  taken  from  an  alphabet 
A  of  size  | A}.  The  \A\NML  possible  configurations  on  the 
volume  are  divided  into  a  set  of  admissible  and  a  set  of  non- 
admissible  configurations.  Let  F(n,m,l)  be  the  number  of 
distinct  admissible  configurations  (or  codewords)  on  an  n  by 
m  by  l  volume  not  violating  the  constraints  within  the  volume. 
The  per  symbol  capacity  (or  maximum  entropy),  C<-3'1  of  the 
3-D  code  defined  by  the  constraints  may  be  defined  as: 

C(3>  =  lim  l06F(n',m^.  (1) 

n,m,i— foo  null 

A  more  formal  treatment  of  the  entropy  definition  and  its 
existence  is  given  in  [1], 

Calkin  and  Wilf  [2]  presented  a  method  giving  tight  bounds 
on  capacity  for  the  (hard  square)  2-D  constraint,  with  binary 
elements,  specified  by  that  for  any  two  4-neighbors,  i.e.  hor¬ 
izontal  and  vertical  neighbors,  both  of  them  can  not  be  T\ 
The  upper  bound  [2]  is  based  on 

A  <Trace(T2p)1/2p,p>0.  (2) 

where  A  is  the  largest  eigenvalue  of  T.  (2)  is  valid  for  real  sym¬ 
metric  matrices  and  it  is  applied  to  the  transfer  matrix  of  the 
constraint  in  one  direction.  Nagy  and  Zeger  [3]  extended  the 
results  to  the  3-D  version  of  the  constraint  above.  (Two  direct 
neighboring  T’s  in  the  direction  of  the  third  axis  is  also  non- 
admissible.)  Let  D  denote  the  dimension  of  the  constraint. 
Their  methods  may  be  applied  to  other  constraints,  but  they 
are  restricted  to  constraints  for  which  the  transfer  matrices 
are  symmetric  in  at  least  D  —  1  directions.  This  is  satisfied  for 
constraints  which  are  of  1st  order  and  symmetric  in  (at  least) 
all  but  one  direction.  Here  we  address  the  problem  of  bound¬ 
ing  capacity  for  higher  order  and  non-symmetric  constraints 
in  3-D,  eg.  limits  on  run-lengths  or  distances  (>  3). 

II.  Upper  bound  for  higher  order  3-D 

CONSTRAINTS 


In  order  to  achieve  an  upper  bound  we  shall  specify  a  source 
which  has  the  required  symmetric  transfer  matrices  and  as  a 
subset  can  generate  all  configurations  admissible  by  the  orig¬ 
inal  constraint.  In  [4]  we  presented  a  way  to  do  this  in  2-D. 
Extending  to  3-D  leads  to  the  following  construction.  The 
states  are  defined  by  the  admissible  configurations  within  4 
sub-states,  which  are  rectangular  boxes  of  equal  size.  The 
sub-states  forming  one  state  must  have  the  same  boundary 
configuration  of  width  M  -  1  in  the  m-direction  and  L  -  1  in 
the  I-direction.  The  states  extend  N  —  1  in  the  n-direction. 
The  admissible  transitions  between  the  combined  states  in  all 
generating  si  by  s2  distinct  elements  are  specified  by  G4ll,2.- 
The  transitions  are  admissible  iff  the  transitions  of  the  4  sub¬ 
states  are. 

Theorem  1:  The  capacity  of  a  3-D  code  specified  by  shift 
invariant  constraints  of  finite  extent  ( N ,  M,  L),  has  the  upper 
bound 

O'3,  <  //"(ai’32)  (3) 

SlS2 

where  H"  is  the  capacity  determined  by  the  logarithm  of  the 
largest  eigenvalue  of  Gs,,,3  of  the  given  constraint,  si  and  s2 
are  positive  even  integers. 

The  principles  of  the  proof  is  as  follows.  (2)  is  applied  first 
in  one  and  then  in  another  direction  as  in  [3].  We  need  to  en¬ 
sure  that  the  matrices  are  symmetric.  Given  a  non-symmetric 
transfer  matrix  T  (in  one  direction),  introduce  A  =  Tp  and 
the  symmetric  matrix  C  =  A+A*,  where  *  denotes  the  trans¬ 
pose.  Applying  (2)  to  C2,  the  bound  is  assymptotically  dom¬ 
inated  by  Trace( AA‘).  A*  may  be  described  as  the  reverse 
transition  of  A.  So  the  trace  counts  configurations  which  are 
given  by  two  transitions  starting  and  ending  in  the  same  state. 
Used  in  two  directions  leads  to  the  construction  above  and 

Gfl,»2. 

We  expect  to  achieve  improved  numerical  results  using  (3) 
in  3-D  as  we  did  in  2-D  [4]  using  the  same  approach  to  derive 
symmetric  transfer  matrices  generating  all  admissible  config¬ 
urations  as  a  subset. 
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Abstract  —  The  Hadamard  matrix  structure  is  ap¬ 
plied  to  the  construction  of  space-time  codes.  Space- 
time  Hadamard  (STH)  codes  are  statistically  analyzed 
with  respect  to  diversity  and  coding  gain  criteria  and 
are  shown  to  have  good  statistical  properties. 

I.  Introduction 

Two  design  criteria  are  derived  for  space-time  codes  in  [1]. 
The  performance  gain  is  shown  to  be  dependent  on  the  min¬ 
imum  rank  and  the  minimum  sum  of  eigen  values  based  on 
codeword  difference  matrices.  Present  code  designs  consist 
of  orthogonal  block  constructions  [2],  [3]  which  maximize  the 
rank,  and  empirically  constructed  convolutional  codes  or  codes 
found  using  exhaustive  search  algorithms.  The  challenge  in 
finding  a  space-time  code  construction  is  complicated  because 
the  codes  exist  in  an  infinite  complex  field  instead  of  a  finite 
real  field.  STH  codes  offer  a  flexible  design  construction  which 
produces  statistically  good  codes. 

II.  Space-Time  Hadamard  (STH)  Code 
Construction 

[1]  shows  that  for  the  codeword  matrix  difference  c  —  c, 
maximizing  the  rank  corresponds  to  maximizing  the  rate  that 
the  BER  decreases.  This  is  the  dominant  gain  criterion  for 
asymptotic  Eb/No .  Space-time  Hadamard  code  construction 
(STH  codes)  is  based  on  Hadamard  matrices  Hn  of  order  m  = 
2n.  These  codes  are  designed  to  give  statistically  good  rank 
properties  which  improve  with  increasing  .constellation  size, 
and  code  length.  STH  codes  can  be  recursively  constructed 
as  follows 


Let  An+i  =  diag(\i,  \2,  ■  ■ .  ,A2n+i)  be  a  diagonal  ma¬ 
trix  of  eigen  values  of  the  codeword  c.  The  recursive  code¬ 
word  matrix  is  IV(n+1)(f)  =  2_(”+1)H„+iAn+iiIn+i.  I  cor¬ 
responds  to  the  order  of  the  direct  sum  extension  of  STH 
codes  where  Wn(l)  has  dimensions  n  x  l  ■  n.  Denote  A„  '  = 
diag(\i,  A2, . . .  ,  A2"),  A„2>  =  diag{ A2n+i,  A2, . . .  ,  A2n+i). 

Then 


and 

IU(n+1)(l)  =  2-n=1tfn+1A„+itfn+i 

_  n-(n+l)  (  Hn  (A.P  +  A ™)Hn  Hn( Al1}  -  A ™)Hn  \ 

V  Hn{ Alx)  -  A l2))Hn  Hn{ Ai1}  +  A(n2))Hn  J  ' 

The  factor  2_^n+1)  is  a  normalization  factor.  The  symmetric 
matrix  has  the  first  row 

( ini,tU2 ,•••  ,ui2n+i)  =  2~^+1)(Ai,  A2, . . .  ,  A2(„+1>)ff'(n+1). 

All  other  rows  are  permutations  of  the  first  row. 

The  basic  structure  of  STH  codes  can  be  modified  to  give 
different  code  parameters.  By  selecting  a  subset  of  T  columns 
of  the  codeword  matrix,  we  obtain  the  T-reduced  STH  code. 
By  taking  the  direct  sum  of  several  STH  matrices,  we  obtain 
extended  STH  codes  where  l  >  1. 

III.  Statistical  Properties 

Table  1  shows  that  as  the  constellation  size  and/or  the  ex¬ 
tension  order  l  increases,  the  fraction  of  full  rank  codewords 
approaches  1.  Statistical  analysis  shows  that  as  the  rank  de¬ 
creases,.  the  eigen  value  sum  increases,  which  also  produces  a 
performance  gain.  Statistically  good  codes  can  be  constructed 
for  small  constellations  and  code  sizes. 

Table  1:  Rank  statistics  for  STH  codes  W^1^/)  over  dif¬ 
ferent  constellations. 


IV.  Conclusions 

The  STH  code  construction  is  presented  and  evaluated 
based  on  the  rank  and  eigen  value  design  criteria  [1].  This 
code  construction  shows  good  statistical  properties.  The  ba¬ 
sic  construction  can  be  adapted  to  give  different  code  param¬ 
eters.  The  code  properties  improve  with  the  code  length  and 
constellation  size. 
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Abstract  — -  The  application  of  Maximum  Rank  Dis¬ 
tance  ( MRD )  codes  is  investigated  with  respect  to 
the  space-time  code  scenario.  A  construction  method 
is  presented  based  on  primitive  polynomials  over  ex¬ 
tended  Galois  fields.  A  one-to-one  mapping  is  then 
performed  between  the  Galois  field  code  symbols  and 
the  complex  transmission  symbols. 

I.  Introduction 

In  [2]  the  design  criteria  for  space-time  codes  was  derived 
which  showed  that  for  asymptotic  Eb/No,  the  rate  of  perfor¬ 
mance  gain  was  dominated  by  the  minimum  rank  of  the  code¬ 
word  difference  matrices  ( diversity  gain).  Present  space-time 
constructions  include  the  class  of  orthogonal  space-time  codes, 
where  all  codeword  matrix  columns  are  mutually  orthogonal 
and  convolutional  space-time  codes  which  are  constructed  by 
hand  or  found  through  exhaustive  searches.  Both  types  of 
codes  seek  to  maximize  the  minimum  rank  over  the  set  of  all 
codeword  differences. 

The  rank  code  construction  based  on  [1]  is  applied  to  space- 
time  codes.  The  codeword  rank  is  maximized  to  give  maxi¬ 
mum  rank  distance  (MRD)  codes.  This  results  in  a  new  de¬ 
sign  technique  for  space-time  code  construction. 

II.  MRD  Code  Construction 

We  define  the  matrix  primitive  polynomial  as 

f(x)  =  xT  +  br-ix T_1  +  br-2XT~2  +  . . .  +  bix  +  bo,  (1) 


The  resulting  T  x  T  matrices 

C  =  { 0,C,C2,...  ,CqT-2,CqT~1}  (3) 

define  an  MRD  code  of  cardinality  qT  with  rank  distance  T. 
Furthermore,  these  codes  are  linear  [1], 

We  now  map  the  GF(q)  elements  to  a  complex  signal  con¬ 
stellation.  Let  Ac  be  a  complex  signal  constellation  of  size 
q.  Define  a  one-to-one  map  AcF(q)  4=k  Ac-  If  GF(q)  — 
{a“°°  =  0  U  a*,  i  =  0, 1, . . .  ,q  —  2}  and  \Ac\  =  q  then 
we  define  the  following  mapping  0  =  a~°°  :=  7a-°o  and 
ft  .  7ai ,  i  —  0,1,...  ,  q  —  2. 

Consider  an  MRD  code  C  of  T  x  T  matrices  with  rank  dis¬ 
tance  T  generated  by  equation  (3).  We  replace  every  element 
of  GF(q)  by  the  corresponding  element  from  the  constellation 
Ac  using  the  defined  mapping.  This  gives  the  code  C(Ac) 
over  the  constellation  Ac-  For  a  given  constellation,  we  have 
to  verify  whether  the  resulting  code  is  MDR. 

III.  Search  Results  for  MRD  codes. 

The  authors  have  found  that  the  mapping  from  GF( 2) 
MRD  codes  to  any  complex  binary  constellation  produces 
complex  MRD  codes.  For  the  2x2,  GF( 22)  MRD  code, 
it  has  been  shown  that  the  4  PSK  constellation  produces  a 
complex  MRD  code. 

We  note  that  the  complex  MDR  code  space  is  a  finite  sub¬ 
set  of  an  infinite  complex  space.  The  challenge  is  in  finding 
a  modulation  alphabet  which  produces  complex  MDR  code, 
and  which  is  still  practical  for  an  information  system. 


where  bi  €  GF(q),  i  =  0, 1, . . .  ,  T  —  1  with  the  restriction 
that  br  =  1.  P  €  GF(q)  is  a  primitive  element  such  that 
f(P)  =  0.  Furthermore,  let  p(x)  be  the  element  primitive 
polynomial  for  the  extension  field  q  =  ps: 


p(x)  =  x3  +  a,-ix’  1+as-2Xs  2  +  ...aix  +  ao,  (2) 


where  a*  6  GF(p),  i  =  0, 1, . . .  ,  s  —  1  with  the  restriction 
that  Os  =  1.  ft  is  a  primitive  element  of  GF(p)  such  that 
p(ft)  =  0  and  p  is  a  prime  number.  The  associated  (primitive) 
matrix  C  constructed  from  f(P)  is  written  as 


/  0  i-o  \ 


c  = 


0 

V  b0 


0 

bi 


1 

br-  i  / 


The  elements  bi  can  be  represented  in  terms  of  a  based  on 
equation  2.  This  primitive  matrix  has  analogous  properties  to 
primitive  elements  for  vector  fields. 


IV.  Conclusions 

The  matrix  rank  of  codewords  is  maximized  to  give  maxi¬ 
mum  rank  distance  (MRD)  codes.  These  codes  are  first  con¬ 
structed  in  GF(q)  and  then  mapped  to  a  complex  signal  con¬ 
stellation.  The  properties  of  the  chosen  signal  constellation 
determines  the  resulting  code  which  may  no  longer  be  MRD. 
MRD  codes  exist  for  all  binary  constellations,  and  for  the 
2x2  MDR  code  over  4  PSK. 
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Abstract  —  Recently,  a  general  approach  to  differ¬ 
ential  modulation  for  multiple  transmit  and  receive 
antennas  was  proposed  by  Hughes,  and  by  Hochwald 
and  Sweldens.  In  this  approach,  data  are  differen¬ 
tially  encoded  using  a  restricted  class  of  space-time 
group  codes  in  which  each  code  matrix  is  square  and 
has  equal-energy,  orthogonal  rows.  In  this  talk,  we 
remove  the  restrictions  imposed  in  earlier  work  and 
extend  the  theory  of  differential  transmission  to  arbi¬ 
trary  Slepian-type  group  codes.  This  extension  leads 
to  new  modulation  techniques  that  significantly  out¬ 
perform  previously  known  methods,  both  for  single 
and  multiple  antenna  systems.  Applications  to  uni¬ 
versal  channel  coding  for  discrete  memoryless  chan¬ 
nels  are  also  discussed. 

I.  Summary 

In  wireless  communication,  fading  due  to  multipath  sig¬ 
nal  propagation  often  has  a  severe  impact  on  system  perfor¬ 
mance.  One  way  to  improve  performance  is  to  increase  diver¬ 
sity  through  the  use  of  multiple  antennas  at  the  transmitter 
and/or  receiver.  Modulation  techniques  designed  for  multiple 
transmit  antennas  —  called  space-time  modulation  or  transmit 
diversity  —  have  been  shown  to  be  highly  effective  in  reducing 
the  effects  of  fading,  and  can  also  dramatically  increase  the  ca¬ 
pacity  of  multipath  radio  channels,  especially  when  combined 
with  multiple  antennas  at  the  receiver. 

In  recent  years,  a  wealth  of  space-time  coding  and  mod¬ 
ulation  techniques  have  been  proposed.  Most  early  work  fo¬ 
cused  on  the  coherent  case,  when  accurate  channel  estimates 
are  available  at  the  receiver  but  not  the  transmitter.  More 
recently,  there  has  also  been  considerable  interest  in  the  non¬ 
coherent  case,  when  channel  estimates  are  not  available  at  the 
transmitter  or  receiver.  In  this  case,  Marzetta  and  Hochwald 
[3]  have  shown  that,  for  large  signal-to-noise  ratios,  the  capac¬ 
ity  of  a  multi-antenna  quasi-static  Rayleigh  fading  channel  is 
approached  by  unitary  space-time  block  codes,  in  which  the 
signals  transmitted  by  different  antennas  have  equal  energy 
and  are  mutually  orthogonal. 

Recently,  Hughes  [2]  and  Hochwald  and  Sweldens  [1]  inde¬ 
pendently  proposed  a  general  approach  to  differential  mod¬ 
ulation  for  multiple  transmit  and  receive  antennas  (see  [4,  5] 
for  other  approaches).  In  this  approach,  data  are  differentially 
encoded  using  a  restricted  class  of  space-time  group  codes  in 
which  each  code  matrix  is  square  and  has  equal-energy,  or¬ 
thogonal  rows. 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  grant  CCR-9903107,  and  by  the  Center  for  Advanced 
Computing  and  Communication. 


In  this  talk,  we  remove  the  restrictions  imposed  in  earlier 
work  and  extend  the  theory  of  differential  transmission  to  arbi¬ 
trary  Slepian-type  group  codes.  For  a  system  with  t  transmit 
antennas,  we  consider  block  codes  in  which  each  code  matrix 
is  of  the  form 

C  =  DG  , 

where  D  is  a  fixed  t  x  n  complex  matrix,  and  where  G  belongs 
to  an  algebraic  group  Q  of  n  x  n  unitary  matrices  (GG*  —  I). 
Here,  the  rows  of  C  represent  symbols  transmitted  by  different 
antennas,  and  the  columns  represent  symbols  transmitted  at 
different  times.  Using  this  code,  a  sequence  of  messages  Gk  € 
Q  can  be  differentially  encoded  in  a  way  similar  to  differential 
PSK:  At  time  k  =  0  we  send  the  block  Co  =  D  to  initialize 
transmission.  Thereafter,  to  send  message  Gk  in  block  fc,  we 
send 

Ck  =  Ck-iGk  ,  A;  =  1,2,... 

The  group  property  ensures  that  Ck  €  DQ  for  all  k. 

In  this  work,  we  consider  transmission  of  the  differentially 
encoded  sequence  Ck  over  a  flat-fading  Rayleigh  channel  in 
the  presence  of  additive  Gaussian  noise.  We  derive  a  differ¬ 
ential  receiver  that  reliably  recovers  Gk  without  knowledge  of 
current  channel  fading  conditions.  We  further  derive  a  bound 
on  the  error  probability  of  this  receiver  as  well  as  modulator 
design  criteria.  The  design  criteria  lead  to  new  differential 
modulation  techniques  for  both  single  and  multiple  antenna 
channels  that  significantly  outperform  the  best  codes  in  [1,  2]. 
Extensions  to  universal  channel  coding  for  discrete  memory¬ 
less  channels  are  also  discussed. 
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Abstract  —  Many  wireless  systems  today  employ 
error  correcting  coding.  Adding  transmit  diversity 
may  further  improve  the  performance.  We  study  the 
options  to  achieve  the  goal  without  significant  change 
to  the  existing  systems. 


I.  Introduction 

Transmit  diversity  is  an  effective  technique  to  mitigate 
channel  fading  in  wireless  communication  systems.  Many 
space-time  coding  (STC)  techniques  have  been  suggested  to 
provide  transmit  diversity  [1,  2]. 

Most  modern  wireless  systems  employ  forward  error- 
correcting  coding  (FEC).  Providing  additional  transmit  di¬ 
versity  to  such  systems  is  of  practical  interest  and  also  chal¬ 
lenging.  In  this  paper,  we  study  several  possible  techniques  to 
achieve  the  diversity  gain. 


II.  Combination  of  FEC  and  Antenna  Hopping 

One  simple  way  to  achieve  transmit  diversity  for  coded  sys¬ 
tems  is  to  use  the  Alamouti  STC  scheme  [3].  A  possibly  sim¬ 
pler  method  is  to  use  antenna  hopping  [4].  The  coded  bits 
from  the  FEC  encoder  are  first  interleaved  and  then  transmit¬ 
ted  alternately,  in  bursts,  through  multiple  transmit  antennas. 

Without  coding  this  scheme  clearly  provides  no  diversity. 
The  bit-error  rate  can  be  very  high  if  the  path  from  any  an¬ 
tenna  is  in  a  deep  fade.  However,  coding  groups  together 
many  symbols  with  potentially  different  fading  levels  into  one 
codeword.  For  a  powerful  FEC  code  with  large  free  Hamming 
distance,  diversity  combining  can  take  place  during  the  calcu¬ 
lation  of  codeword  metrics  for  maximum  likelihood  decoding. 

To  obtain  more  insight,  we  consider  the  pairwise  error  prob¬ 
ability  between  a  pair  of  codewords  with  Hamming  distance 
d,  for  a  two-antenna  system.  We  assume  that  each  bit  can 
be  transmitted  from  either  antenna  equally  likely.  Then  the 
probability  that  a  total  of  k  out  of  d  bits  axe  transmitted  from 
antenna  1  is  of  a  binomial  distribution  (^)0.5d.  For  Rayleigh 
channels  with  AWGN,  we  obtain  an  upper  bound  on  the  av¬ 
erage  pairwise  error  probability  (APEP)  for  a  two-antenna 
system 


APEP 


< 


1+k  1  +  iLp  . 


(i) 


where  Es  is  the  energy  per  symbol  and  N0  is  the  noise  den¬ 
sity.  For  large  d,  a  diversity  order  of  two  is  achieved  with 
high  probability.  We  will  show  that  the  bounds  for  antenna 
hopping  and  the  Alamouti  scheme  are  quite  close  for  large  d. 
Calculation  of  channel  cutoff  rate  also  shows  that  the  perfor¬ 
mance  of  the  two  schemes  is  close.  Note  that  the  binomial 
distribution  is  a  conservative  estimation.  For  practical  cod¬ 
ing  schemes,  a  simple  (even-odd)  hopping  can  often  ensure  a 
diversity  order  two. 


For  a  large  number  of  transmit  antennas,  it  is  difficult  to 
design  STC  based  on  orthogonal  design  [2].  In  this  case,  we 
can  combine  antenna  hopping  with  STC.  For  example,  we  can 
partition  four  antennas  into  two  two-antenna  groups.  The 
interleaved  FEC  output  is  switched  alternately,  in  bursts,  to 
the  two  groups.  In  each  group,  a  STC  is  used  for  the  two 
antennas.  With  this  method,  more  transmit  antennas  can  be 
used  to  increase  the  diversity  order. 

In  Figure  1,  we  compare  the  performance  of  FEC  concate¬ 
nated  with  a  single  antenna,  two-antenna  (even-odd)  hopping, 
and  two-antenna  STC,  four-antenna  hopping,  group  hopping 
with  two-antenna  STC,  and  a  hypothetical  four-antenna  STC. 
The  FEC  code  is  a  rate- 1/3  parallel  concatenated  convolu¬ 
tional  (turbo)  code  with  16-state  component  codes. 


Figure  1:  Performance  of  turbo-coded  BPSK  over  block 

Rayleigh  fading  channels. 

III.  Other  Concatenation  Methods 

We  will  also  discuss  the  design  of  and  show  results  for  space- 

time  turbo  coding  and  other  trellis  based  methods. 
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Abstract  —  A  new  algorithm  for  computing  the  free 
distance  of  turbo  codes  is  applied  to  the  CCSDS  and 
the  UMTS  standard  codes.  Results  on  the  free  dis¬ 
tance  behaviour  for  increasing  interleaver  length  are 
also  presented. 

I.  Introduction 

It  is  known  that  turbo  codes  may  have  low  free  distances 
dfree,  despite  of  very  large  interleaver  lengths  N.  This  causes 
their  BER  curves  to  flatten  following  the  “error  floor”  imposed 
by  dfree,  after  the  “water-fall”  decrease  at  low  signal-to-noise 
ratios.  This  behaviour  may  be  not  admissible  for  applications 
requiring  very  low  Bit  Error  Rates  (BER  <  10-8  —  10~10). 
In  [1]  we  have  developed  a  new  algorithm  for  computing  the 
free  distance  d(Tee  of  parallel  and  serially  concatenated  codes 
with  interleavers,  based  on  the  new  notion  of  constrained  sub¬ 
codes.  The  algorithm  permits  to  compute  large  distances 
without  constraint  on  the  input  sequence  weight.  Since  d{Tee 
and  its  multiplicity  dominate  the  code  performance  at  very 
high  signal-to-noise  ratios,  their  knowledge  allows  to  analyti¬ 
cally  estimate  the  code  error  floor,  i.e.,  the  code  performance 
for  very  low  probabilities  where  simulation  is  not  feasible. 

As  a  first  example  of  application,  we  present  some  results 
concerning  the  free  distance  behaviour  for  turbo  codes  with 
growing  interleaver  length.  They  provides  some  information 
on  these  two  issues:  (i)  the  improvement  in  terms  of  error 
floor  potentially  available  by  increasing  the  interleaver  length, 
and  (ii)  the  probability  of  choosing  at  random  an  “optimal” 
(in  terms  of  dfree)  interleaver  of  a  certain  length.  In  Fig.  1 
we  report  the  behaviour  of  the  best  and  the  average  free  dis¬ 
tance  for  16-state  rate-1/3  turbo  codes,  obtained  by  randomly 
generating  a  very  large  number  of  turbo  codes  and  evaluating 
their  free  distance. 


Fig.  1:  Distribution  of  dfree  for  16-state  rate- 1/3  turbo  codes. 


E,  1  „N 

b  0 

Fig.  2:  The  error  floors  for  the  Bit  Error  Rates  of  the  rate-1/3 
UMTS  turbo  code  and  the  other  two  codes. 

II.  Application  1:  the  CCSDS  standard 

Recently,  the  CCSDS  telemetry  channel  coding  standard 

has  been  updated  for  including  turbo  codes.  They  consist  of 
the  parallel  concatenation  of  two  16-state  rate-1/4  binary  con¬ 
volutional  encoders  and  a  block  Berrou’s  analytical  interleaver 
with  length  N  =1784,  3568,  7136,  8920,  or  16384.  Four  nom¬ 
inal  code  rates  1/r,  for  r  -  2,  3,  4,  and  6,  can  be  obtained 
through  puncturing. 

We  have  successfully  applied  our  new  algorithm  to  the 
whole  class  of  CCSDS  turbo  codes  with  N  =  1784.  By  de¬ 
noting  with  (dfree/Nfree/wbee)  their  free  distance,  multiplicity, 
and  input  bit  multiplicity,  the  results  are  (17/2/6),  (32/1/2), 
(42/1/2),  and  (70/1/2)  for  r  =  2,  3,  4,  and  6,  respectively. 

III.  Application  2:  the  UMTS  standard 

The  UMTS/3GPP  standard  for  third  generation  personal 
communications  will  use  a  turbo  code.  Its  encoder  consists 
of  the  parallel  concatenation  of  two  8-state  rate-1/2  binary 
convolutional  encoders  and  a  block  interleaver  with  length  N , 
with  320  <  N  <  5120.  Two  nominal  rates  1/r,  with  r  =  2 
and  3,  can  be  obtained  through  puncturing.  For  the  rate-1/3 
code  with  N  =  320,  we  have  applied  the  new  algorithm  and 
obtained  dfree  =  24,  N[iee  =  1  and  wfree  =  4.  For  compari¬ 
son,  we  have  considered  the  classes  of  8-state  parallel  concate¬ 
nated  codes  (PCCC)  and  4-state  serially  concatenated  codes 
(SCCC)  employing  spread  interleavers.  At  best,  we  have  ob¬ 
tained  two  codes  yielding  (21/2/6)  and  (38/1/2),  respectively. 
The  error  floors  for  these  three  codes  are  depicted  in  Fig.  2. 
We  can  observe  that  the  UMTS  turbo  code  has  very  good  per¬ 
formance  at  very  low  BER,  better  than  the  best  PCCC  spread 
found,  even  if  a  SCCC  could  overcome  it. 
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Abstract  —  New  upper  bounds  to  error  probabilities  of  coded 
systems  such  as  turbo  codes  on  the  additive  white  Gaussian  noise 
and  fading  channels  were  obtained. 

I.  A  New  Bound  for  AWGN  Channels 

For  binary  (n,  k)  block  codes  which  include  turbo  and  serial  codes 
the  bit  and  word  error  probabilities  using  a  technique  in  [1]  can  be 
bounded  by  [2] 


n—k+1  _ 

Pe<  111111  [e-nE(c'h\enri5)Q(V27h)} 

h=hmi„ 

where  ,  ... 

1  cf(c)  e2r(-s>  - 1 

£(c,  h)=  -  ln[l— 2c0ffl/(c)]+- .  j  A  c0(<5)  <  c  < 

2  1  +  /(c)  25(1  -  <5) 

otherwise  E{c,h)  —  —r{&)  +  8c.  Also  8  =  h/n,  c  =  Rctt< 

cQ(S)  =  (1  -  and  /(c)  =  Jc/cq-\-2c  +  c2  -  c  -  1. 

For  bit  error  probability  r (5)  =  j-ln^P  jAw<h  (A^/,  is  the  input- 

output  weight  distribution),  and  for  word  error  probability  r(S)  = 
^lnA/,  ( A i,  is  the  output  weight  distribution).  This  is  the  tight¬ 
est  “closed  form”  upper  bound  on  decoding  error  rate.  The  mini¬ 
mum  Eb/N0  threshold  can  be  computed  as  (£fe/N0)tjlresj10ici  = 
max^  cq(5)/Rc.  In  [2]  we  proved  that  the  threshold  for  Poltyrev 
bound  (see  [3]  and  references  there)  is  the  same  as  our  threshold, 
thus  the  proposed  bound  is  as  tight  as  Poltyrev  bound  for  very  large 
blocksize  n.  The  simple  bound  for  AWGN  channel  is  applied  to  ob¬ 
tain  the  ML  performance  of  rate  1/4  Repeat  Accumulate  (RA)  codes 
as  in  Fig.  1.  Also  in  the  Figure  the  performance  of  suboptimum  iter¬ 
ative  turbo  decoder  for  RA  codes  are  shown. 


Figure  1 :  ML  upperbound  on  the  bit  error  probability  of  a  rate 
1/4  RA  codes  over  AWGN  Channel 

*The  work  described  was  funded  by  the  TMOD  Technology  Program  and 
performed  at  the  Jet  Propulsion  Laboratory,  California  Institute  of  Technology 
under  contract  with  the  National  Aeronautics  and  Space  Administration. 


II.  A  New  Bound  for  Fading  Channels 
For  independent  Rayleigh  fading  with  CSI 
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k—hmin 
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8,  c,  and  r(5)  are  defined  as  in  the  previous  section.  The  maximum 
with  respect  to  <p  can  be  obtained  in  a  closed-form,  then  the  remaining 
maximizations  must  be  performed  numerically.  The  simple  bound  for 
Rayleigh  fading  channel  is  applied  to  obtain  the  ML  performance  of 
rate  1/4  Repeat  Accumulate  (RA)  codes  as  in  Fig.  2.  Also  in  the 
Figure  the  performance  of  suboptimum  iterative  turbo  decoder  for 
RA  codes  over  independent  Rayleigh  fading  with  CSI  are  shown. 


Figure  2:  ML  upperbound  on  the  bit  error  probability  of  rate 
1/4  RA  codes  for  Rayleigh  fading  channel 
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Abstract  -  The  number  of  errors  that  a  convolutional  codes  can 
correct  in  a  segment  of  the  encoded  sequence  is  upper  bounded  by 
the  number  of  distrinct  syndrome  sequences  of  the  relevant  length. 

I.  Introduction 

We  shall  analyse  the  error  correcting  power  of  a  convolutional  code 
by  relating  the  number  of  correctable  errors  to  the  available  syndromes. 
The  results  are  related  to  the  bound  in  [1],  but  we  take  a  more  direct 
approach.  Syndromes  for  convolutional  codes  have  not  received  much 
attention  since  the  structural  results  appeared  in  [2].  The  main  diffi¬ 
culty  compared  to  block  codes,  is  that  different  sequence  lengths  have 
to  be  considered.  Even  though  a  Hamming  type  upper  bound  usually 
cannot  be  reached,  it  is  an  important  estimate  of  the  number  of  errors 
that  can  be  corrected  with  high  probability  by  a  typical  code. 


a  convenient  answer  to  questions  of  this  type,  since  the  syndromes  are 
simply  assumed  to  be  zero  outside  the  window  under  consideration. 
Thus  we  seek  a  rule  for  segmenting  the  error  pattern  into  finite  strings 
in  such  a  way  that  any  concatenation  of  correctable  strings  form  a 
correctable  error  sequence.  This  gives  a  variable  length  description  of 
the  correctable  error  patterns  which  may  be  related  to  a  segmentation 
of  the  syndrome  sequence.  The  segments  may  be  mapped  on  the  leaves 
of  a  tree  where  the  branches  are  labeled  by  the  syndrome  bits. 

IV.  An  Upper  Bound  by  the  Kraft  Inequality 

We  may  obtain  a  Hamming  type  upper  bound  by  relating  the  error 
sequence  and  the  syndrome  sequence  through  a  version  of  Kraft’s 
inequality: 


II.  Correctable  Errors  for  Short  Sequences 

In  [3]  a  general  method  for  relating  bounds  for  block  codes  to  convo¬ 
lutional  codes  was  introduced.  Thus  an  upper  bound  on  the  number 
of  errors  that  can  be  corrected  independent  of  their  location,  to  ,  may 
be  derived  from  the  Hamming  bound  for  block  codes.  However,  a  direct 
analysis  of  errors  and  syndromes  in  convolutional  codes  gives  a  tighter 
bound,  since  some  error  patterns  give  rize  to  short  syndromes. 


Theorem  1:  If  a  binary  (n,k)  convolutional  code  with  encoder  memory 
M  (blocks)  and  syndrome  former  memory  M’  corrects  all  combinations 
of  to  errors,  the  inequality 


E 


ns' 

j , 


c(j )  <  2("-*)(*+m/) 


is  satisfied  for  any  s>M  and  j<  to.  Here  c(j)  is  the  number  of  truncated 
codewords  of  weight  j. 

The  bound  will  be  applied  to  examples  of  short  high  rate  codes,  and 
we  shall  demonstrate  how  the  factor  c(j)  makes  the  bound  sharper  than 
the  translated  Hamming  bound.  It  is  essential  to  the  performance  of 
convolutional  codes  that  the  number  of  correctable  errors  increases  with 
the  length  of  the  sequence.  Thus  we  are  interested  in  the  number  of 
errors,  tj ,  that  can  be  corrected  in  a  sequence  of  length  j  blocks, 
provided  that  no  more  than  to  errors  occur  in  a  sequence  of  length  j- 1 . 
This  approach  may  be  extended  to  yield  a  description  of  distributions 
of  correctable  errors  in  short  sequences. 

III.  A  Variable  Length  Description  of  Errors 

An  obvious  question  about  a  convolutional  code  is,  how  often  can  a 
burst  of  L  errors  be  corrected?  Our  first  approach  above  does  not  give 


Theorem  2:  For  a  tree  of  correctable  error  patterns,  the  number  of  paths 
of  length  L  (blocks)  is  A(L).  Then  the  number  of  check  symbols  per 
block,  r,  must  satisfy 


!>(/,)  vl  <  i 

This  version  of  the  upper  bound  indicates  that  for  short  codes  there  is 
a  trade-off  between  a  high  value  of  t„  and  a  rapid  increase  in  the 
number  of  correctable  errors  with  the  length  of  the  sequence.  Clearly 
for  long  codes,  the  fraction  of  errors  is  given  by  the  saymptotic  Ham¬ 
ming  bound. 

V.  Relation  to  the  Bound  by  Finite  State  Algorithms 

While  the  bound  of  Theorem  2  gives  a  convenient  way  of  testing  partial 
descriptions  of  error  patterns,  the  variable  length  description  usually 
leads  to  an  infinite  tree.  Thus  a  complete  weight  specification  is  natu¬ 
rally  described  by  a  finite  state  algorithm,  and  we  arrive  at  the  upper 
bound  discussed  in  [1], 
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Abstract  —  We  derive  bounds  on  the  error  probabil¬ 
ity  of  ML  decoded  LDPC  codes,  for  any  binary-input 
symmetric-output  channel.  For  appropriately  chosen 
ensembles  of  LDPC  codes,  reliable  communication  is 
possible  up  to  channel  capacity.  The  lower  and  upper 
bounds  coincide  asymptotically,  indicating  a  polyno- 
mially  decreasing  ensemble  averaged  error  probabil¬ 
ity.  For  ensembles  with  suitably  chosen  parameters, 
the  error  probability  of  almost  all  codes  is  exponen¬ 
tially  decreasing.  Furthermore,  the  error  exponent 
can  be  set  arbitrarily  close  to  the  standard  random 
coding  exponent. 

I.  Introduction 

In  this  paper  we  examine  the  error  probability  of  optimal 
(Maximum  Likelihood)  decoding  of  low  density  parity  check 
(LDPC)  codes,  first  introduced  by  Gallager  [1]  in  1963. 

We  consider  two  ensembles  of  LDPC  codes.  The  first  en¬ 
semble  was  proposed  by  Mackay  [2].  The  second  ensemble  is 
based  on  bipartite  regular  graphs,  and  was  used  by  several 
researchers,  e.g.  [3]. 

II.  An  independent  matrix  column  ensemble 

We  consider  the  ensemble  of  parity  check  matrices  Alxn 
(corresponding  to  a  code  with  block  length  N  and  L  parity 
check  equations)  defined  by  applying  the  following  procedure 
to  each  column  of  A,  for  some  integer  t.  First  set  the  entire 
column  to  0’s.  Then  t  times  an  index  is  drawn  uniformly  and 
independently  from  {1,2,  ...,£}  and  the  corresponding  bit  is 
flipped.  We  claim  the  following: 

Theorem  1  Consider  the  ensemble  of  binary  parity  check 
matrices  Alxn  described  above,  over  a  memoryless  binary- 
input  symmetric- output  channel.  Let  C  denote  channel  ca¬ 
pacity,  R=  1  -  L/N  and  suppose  that  the  following  conditions 
are  satisfied  for  t  >  3  and  some  0  <  7  <  1/2  and  K  >  0: 

61n m-M  <  g  (1) 

h,( 7)  +  (1  -  R)  (log  (l  +  e~4e_12-K  )  -  l)  <  0  (2) 

and 

R  +  G(R,7t)<C  (3) 

where  i 

G(R,jt)=  max  {(1  —  R)hi(x)  +  7<log(l  —  2x)} 

0<x<l/2 

Denote  the  ensemble_  averaged  maximum  likelihood  decoding 
error  probability  by  Pe .  Then 

log  P e  /  |-1  t  even  ,  . 

log  AT  ~  (  f-2  t  odd  ^ 


h2(x)  is  the  (base-2)  entropy  function.  The  rate  of  the  code 
is  in  fact  lower  bounded  by  R,  due  to  possible  redundancy  in 
the  parity  check  equations.  Perhaps  the  most  striking  feature 
of  the  theorem  is  that  the  right  hand  side  of  (4)  is  indepen¬ 
dent  of  both  R  and  C.  This  behavior  stands  in  contrast  to 
the  various  bounds  on  the  probability  of  error  when  using  ran¬ 
dom  coding,  where  the  bound  is  monotonically  increasing  with 
increasing  R  or  decreasing  C. 

Furthermore,  it  can  be  shown  that  (l)-(3)  hold  when  either 
R  <  C  and  t  is  large  enough,  or  when  for  given  t,  D  >  0  is 
small  enough,  where  D  is  a  quality  parameter  of  the  channel, 
D=  £  ^P(y|0)P(y|l)  (P(y|x)  describes  the  channel). 

III.  Codes  derived  from  bipartite  regular 

GRAPHS 

A  popular  method  for  obtaining  an  ensemble  of  sparse 
parity-check  codes  is  defined  in  terms  of  a  bipartite  graph. 
This  is  done  by  constructing  a  c  —  d  regular  bipartite  graph  in 
which  there  are  N  vertices  on  the  left  side  of  the  graph,  each 
of  degree  c,  and  L  vertices  on  the  right,  each  of  degree  d,  so 
that  Nc  =  Ld.  This  ensemble  is  described  in  [3]. 

It  can  be  shown  that  the  following  holds, 

j.  —  log  Pe  _  f  §  —  1  c  even 

n™ 00  log  A  {  c  —  2  c  odd 

provided  that  c  and  d  satisfy  conditions  analogous  to  the  con¬ 
ditions  set  in  Theorem  1. 

IV.  Expurgated  ensembles 

Our  results  can  be  greatly  improved  by  expurgating  from 
the  ensembles  codes  which  have  small  minimal  distance.  It 
turns  out  that  when  the  ensemble  parameter  t  (c  and  d)  is 
sufficiently  large,  the  error  probability  of  the  expurgated  en¬ 
semble  is  exponentially  decreasing,  and  the  exponent  is  arbi¬ 
trarily  close  to  the  random-coding  exponent. 
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Abstract  —  Puncturing  is  the  predominant  strategy 
to  construct  high  code  rate  convolutional  encoders, 
and  infinite  impulse  response  convolutional  encoders 
are  an  essential  building  block  in  Turbo  codes.  In  this 
paper  various  properties  of  convolutional  encoders 
with  these  characteristics  are  developed.  In  partic¬ 
ular,  the  closed  form  representation  of  a  punctured 
convolutional  encoder  and  its  generator  matrix  is  con¬ 
structed,  necessary  and  sufficient  conditions  are  given 
such  that  the  punctured  encoders  retain  the  infinite 
impulse  response  property,  and  various  lower  bounds 
on  distance  properties,  such  as  effective  free  distance, 
are  developed.  Finally,  necessary  and  sufficient  con¬ 
ditions  are  given  on  the  inverse  puncturing  prob¬ 
lem:  representing  a  known  convolutional  encoder  as 
a  punctured  encoder. 

Turbo  codes,  introduced  in  1993  by  Berrou  et  al.[l],  use 
systematic  infinite  impulse  response  (HE)  convolutional  en¬ 
coders  as  building  blocks.  The  HR  or  recursive  constraint  is 
imposed  to  achieve  interleaver  gain  [2].  The  systematic  con¬ 
straint  is  imposed  so  that  the  information  bits  are  used  only 
once  in  a  codeword  together  with  the  parity  bits  from  both 
constituent  binary  systematic  convolutional  encoders.  In  this 
paper  the  interest  is  in  convolutional  codes  for  Turbo  codes 
on  bandlimited  channels.  Such  channels  force  the  code  rate 
of  each  constituent  code  to  be  high.  Since  punctured  convo¬ 
lutional  encoders  are  the  most  practical  class  of  convolutional 
encoders  that  generate  high  code-rate  codes,  puncturing  is 
imposed  on  the  constituent  convolutional  encoders.  In  this 
paper,  we  refer  to  the  original  encoder  from  which  the  punc¬ 
tured  encoder  is  derived  as  the  parent  encoder. 

In  this  paper,  we  are  interested  in  designing  binary  punc¬ 
tured  convolutional  codes  that  are  HR,  may  or  may  not  be 
systematic,  and  that  perform  well  when  used  in  a  Turbo  set¬ 
ting.  To  characterize  the  effectiveness  of  such  encoders,  in 
[2,  3,  4]  the  commonly  used  free  distance  is  replaced  with 
the  effective  free  distance  d2,  the  minimum  weight  among  all 
codewords  with  weight  2  information  sequences. 

In  this  paper,  polyphase  representation  and  polyphase  de¬ 
composition  [5]  are  generalized.  We  also  introduced  polyphase 
composition.  Some  properties  of  these  polyphase  transforms 
are  derived  that  will  form  the  building  blocks  for  the  rest 
of  the  paper.  Also  in  this  paper,  a  punctured  convolutional 
encoder  is  represented  in  closed  form  using  polyphase  trans¬ 
forms.  Finally,  the  generator  matrix  of  a  punctured  encoder  is 
concisely  derived  similarly  to  McEliece  [5]  and  Hole  [6]  where 
the  parent  encoder  was  assumed  to  be  finite  impulse  response 
(FIR).  Generator  matrices  for  rate-2/3  punctured  systematic 
encoders  were  derived  from  a  parent  rate- 1/2  encoder  in  [4]. 

When  an  HR  convolutional  code  is  punctured  the  resultant 
encoder  is  not  necessarily  HR.  In  this  paper,  given  an  HR 


convolutional  encoder,  necessary  and  sufficient  conditions  are 
derived  to  ensure  that  the  resulting  punctured  encoder  is  HR. 

Various  lower  bounds  are  found  in  this  paper  on  the  effec¬ 
tive  free  distance  o?2  for  punctured  parent  codes.  More  specif¬ 
ically,  for  any  rate-1  /n0  parent  encoder,  a  sufficient  condition 
is  given  that  guarantees  a  punctured  rat e-k/n  encoder  with 
d.2  >  t,  where  1  <  t  <  n.  Also,  for  a  systematic  rate-1/2  par¬ 
ent  encoder  with  irreducible  feedback  polynomial,  sufficient 
conditions,  which  include  a  necessary  and  sufficient  condition 
for  a  class  of  parent  encoders,  are  given  on  di  >  3  of  the  gen¬ 
erated  punctured  rat e-fc/(fc  +  1)  encoders.  Note  that  when 
the  encoder  is  HR  d-2  >  3  also  implies  minimum  free  distance 
greater  or  equal  to  3  which,  as  pointed  out  by  Divsalar  et  al. 
in  [7],  is  a  crucial  condition  on  the  outer  code  for  serial  turbo 
codes  to  have  interleave  gain. 

Good  non-punctured  convolutional  codes  have  been  com¬ 
prehensively  studied  [5,  8].  In  general,  using  these  codes, 
punctured  encoders  are  constructed.  However,  it  is  not  known 
whether  the  rat  e-k/n  good  convolutional  codes  themselves  can 
be  encoded  as  a  punctured  encoder  with  rate-fco/no  parent  en¬ 
coder  such  that  ko  is  much  smaller  than  k.  It  is  shown  in  this 
paper  that  any  rat  e-k/n  systematic  convolutional  code  can  be 
encoded  by  a  punctured  systematic  encoder  with  a  rate-fco/no 
parent  encoder  for  any  factor  fc0  of  fc  and  for  some  no  (<  n). 
Furthermore,  given  fco  and  no,  a  necessary  and  sufficient  con¬ 
dition  is  given  that  guarantee  that  a  rat  e-k/n  convolutional 
code  can  be  generated  from  rate-fco/no  parent  convolutional 
encoder. 
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Abstract — In  this  paper  we  present  the  search  and  determina¬ 
tion  of  a  subset  of  orthogonal  convolutional  codes  called  Convolu¬ 
tional  Self  Doubly  Orthogonal  Codes  ( CS02C ).  These  codes  may 
be  advantageously  utilised  for  the  novel  coding/iterative  deco¬ 
ding  technique  introduced  as  an  important  amelioration  of  Turbo 
Codes.  For  this  technique  the  code  constraint  length  corresponds 
to  the  latency  of  each  decoding  iteration.  Hence,  an  important 
parameter  in  the  code  searching  is  the  minimisation  of  the  code 
constraint  length  for  a  given  error  correcting  capability. 

I.  Introduction 

The  new  coding  system  presented  in  [1],  [2]  represents 
an  important  improvement  over  the  classical  turbo  code 
architecture.  However  it  requires  the  use  of  threshold  de- 
codable  codes  which  must  exhibit  further  orthogonal  pro¬ 
perties  than  the  well  known  orthogonal  codes  [4].  The 
methods  initially  used  to  generate  these  codes  were  based 
on  principles  of  finite  field  Projective  Geometry.  We  present 
new  techniques  based  on  the  use  of  a  random  parameter 
which  produces  CS02C  with  substantially  reduced  length. 

II.  WlDE-SENSE  RATE  CS02C  WITH  RATE  R  =  \ 

A  rate  R=|  convolutional  code  having  J  connections  is 
said  to  be  doubly  orthogonal  in  the  wide  sense,  if  its  J 
generators  {(?;}  satisfy  the  relation  : 

V(i,j,k,l )  i  #  j,  k  ±  l,  j  ±  k,  i  ±  l 
the  dif  ferences  gi  —  gj  —  ( gi  —  gi<)  are  distinct  (1) 
(except  for  unavoidable  index  permutations). 

Code  generation  technique  : 

A  pseudo-random  constructive  method  for  determining  the 
code  generators  is  used.  Starting  from  a  set  of  J  acceptable 
generators,  we  try  to  add  an  element  taken  among  the  na¬ 
tural  integers  arranged  in  ascending  order.  Should  the  new 
set  of  J+l  elements  so  obtained  proved  to  be  self-doubly 
orthogonal,  a  random  test  is  run  in  order  to  decide  whether 
or  not  to  retain  this  additional  integer.  The  procedure  is 
repeated  anew  until  the  required  number  of  elements  is  ob¬ 
tained. 

Length  reduction  : 

Improvement  on  the  code  length  is  atteinped  by  using  a 
reduction  method  based  on  the  following  observation  :  any 
addition  or  multiplication  applied  to  {<?,}  maintains  the 
double  orthogonality  property.  The  reduction  consists  in 
performing  theses  elementary  operations  modulo  an  integer 
n  which  is  gradually  decreased  until  the  largest  reduction 
is  obtained. 

Results  : 

The  code  generation  method  and  its  ensuing  reduction  pro¬ 
cedure  has  yielded  good  novel  CS02C  codes  which  were 
superior  to  those  obtained  by  the  previous  procedure. 


III.  STRICT-SENSE  CS02C  WITH  RATES  R- 

The  self-double  orthogonality  in  the  strict  sense  is  ob¬ 
tained  by  allowing  a  single  connection  between  each  infor¬ 
mation  sequence  and  each  parity  sequence  [1].  That  is,  the 
code  generators  {gij}  must  satisfy  : 

be  ( k,v ),  V(l,m,n )  l  ^  n,  m  ^  n  and  m  ^  v 
9k, i  9m, i  ~~  ( 9v,n  ~  9m, n)  are  distinct. 

Code  generation  technique  : 

Once  again  we  investigate  a  method  which  includes  a  ran¬ 
dom  parameter.  Starting  from  a  set  of  generators  that  we 
know  to  be  self-doubly  orthogonal,  we  perform  a  random 
repartition  of  the  index  order  of  our  matrix  (gij).  Then,  we 
replace  each  generator  (taken  in  the  order  previously  eta- 
blished)  with  the  smallest  natural  integer  that  maintains 
the  property  of  double  orthogonality. 

Length  reduction  : 

The  reduction  procedure  is  based  on  the  method  proposed 
by  Wu  [3].  Simple  addition  and  substruction  operations  are 
performed  over  the  rows  and  columns  of  the  initial  matrix 
of  generators.  An  algorithm  was  developed  for  executing 
the  procedure  iteratively  until  a  set  of  generators  whose 
largest  element  is  as  small  as  possible  is  obtained. 

Results  : 

The  results  obtained  show  that  substantial  reduction  of  the 
lengths  of  the  codes  could  be  achieved  without  requiring  an 
excessive  computation  time. 

Conclusion 

The  novel  methods  have  yielded  very  interesting  results 
with  the  generation  of  different  sets  of  CS02C  with  re¬ 
duced  lengths.  Both  types  of  self-doubly  orthogonal  codes 
(wide  and  strict  sense)  have  been  analysed  and  compared. 
The  error  performances  of  all  these  codes  have  been  de¬ 
termined  by  simulation  using  the  novel  iterative  decoding 
algorithm  [1].  The  new  sets  of  CS02C  generated  improve 
significantly  the  performance  of  the  novel  coding/ iterative 
decoding  system  by  limiting  both  the  latency  at  the  deco¬ 
ding  and  the  amount  of  memory  required. 
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Abstract  —  A  class  of  rate-k/2k  self-dual  convolu¬ 
tional  codes  is  defined,  which  includes,  for  instance, 
the  Golay  Convolutional  Code.  It  is  shown  that  codes 
in  this  class  are  not  asymptotically  catastrophic  in  the 
sense  defined  by  Hemmati  and  Costello  [3]. 

I.  Introduction 

Block  codes  can  be  obtained  from  convolutional  codes  via 
the  tail-biting  construction  where  zero-tail  termination  is  re¬ 
placed  by  tail-biting,  avoiding  the  rate  loss.  If  the  convolu¬ 
tional  code  used  has  long  low-weight  codewords,  the  result¬ 
ing  block  code  can  have  poor  weight  distribution.  Convolu¬ 
tional  codes  which  have  long  codewords  of  low  weight  were 
called  asymptotically  catastrophic  by  Hemmati  and  Costello 

[3].  In  this  paper,  we  show  that  the  class  of  time-invariant 
unit-memory  rate-fc/2fc  self-dual  convolutional  codes  is  not 
asymptotically  catastrophic. 

II.  Self-Dual  (2k,k,k)  Convolutional  Codes 

A  rat e-k/n  convolutional  code  with  overall  constraint 
length  u  and  memory  order  m  [2]  can  have  a  time-varying 
or  time-invariant  encoder.  A  time-varying  (n,  k,  u)  convolu¬ 
tional  encoder  can  altenatively  be  viewed  as  a  time-invariant 
unit-memory  ( n 1  =  nm,  k1  —  km,  v)  encoder  with  memory 
order  m'  =  1  [2]. 

A  linear  code  (block  or  convolutional)  is  self-orthogonal  if 
it  is  contained  in  its  dual,  and  self-dual  if  it  is  equal  to  its  dual 

[1].  The  dual  of  a  linear  code  is  the  set  of  all  codewords  that 
are  orthogonal  to  the  codewords  in  the  code.  For  convolu¬ 
tional  codes,  we  have  the  related  concept  of  the  convolutional 
dual  code.  If  a  convolutional  code  is  generated  by  a  matrix 
G(D),  then  its  convolutional  dual  code  is  generated  by  a  ma¬ 
trix  H(D),  with  G(£>)Ht(£>)  =  0. 

We  define  the  class  <S  of  (2A;,  k,  k)  self-dual  convolutional 
codes  as  follows.  We  only  consider  time-invariant  unit- 
memory  (2k,  k,  v)  convolutional  codes  that  also  have  v  =  k, 
for  k  >  1.  Then,  we  further  restrict  ourselves  to  self-dual 
(2 k,k,k)  codes  to  get  the  class  <S.  Note  that  the  Golay  Con¬ 
volutional  Code  [1]  belongs  to  this  class  as  an  (8,4,4)  code. 

III.  Main  Result 

Let  wo  denote  the  minimum  average  weight  per  branch  over 
all  cycles  in  the  state  transition  diagram  of  a  convolutional 
encoder,  excluding  the  zero-weight  self-loop  around  the  zero 
state.  Hemmati  and  Costello  [3]  defined  a  class  of  codes  to 
be  asymptotically  catastrophic  if  wo  approaches  zero  as  codes 
with  increasing  v  are  considered.  Many  convolutional  code 
classes  are  asymptotically  catastrophic  [3,  4], 

Let  H(D)  be  a  canonical  parity-check  matrix  for  the  con¬ 
volutional  code,  and  let  a,  1  <  i  <  r  —  n  —  k,  be  the 
maximum  degree  of  the  polynomials  in  the  ith  row  of  the 
matrix.  Without  loss  of  generality,  assume  the  ordering 


ei  =  ei  =  ...  =  e7  =  0  for  some  7,  0  <  7  <  r,  and 
I  <  e7+i  <  —  <  eT  —  e-max  ■  If  7  >  0,  the  first  7  rows  of 
H(£>)  define  a  parity-check  matrix  for  an  [n,  n  —  7]  binary 
block  code  £,  with  minimum  distance  de  [5],  For  7  =  0,  let 
£  be  the  trivial  [n,  n]  block  code  having  all  possible  binary  n- 
tuples,  with  de  =  1.  Hole  [5]  has  recently  obtained  the  lower 
bound 

wo  >  de  /e  max  •  (i) 

A  class  of  convolutional  codes  is  not  asymptotically  catas¬ 
trophic  if  wo  is  bounded  away  from  zero  as  v  increases.  In 
practice,  it  is  often  important  to  ensure  that  a  class  of  codes  is 
not  asymptotically  catastrophic.  For  instance,  if  longer  block 
codes  are  obtained  via  tail-biting  constructions  from  convo¬ 
lutional  codes,  the  resulting  distance  properties  can  become 
dependent  on  whether  the  parameter  wo  is  high  or  low. 

Proposition  1  The  class  S  of  (2k,  k,  k)  self-dual  convolu¬ 
tional  codes  defined  above  is  not  asymtotically  catastrophic. 

Proof.  For  any  code  in  S,  the  corresponding  convolutional 
dual  code  is  generated  by  the  reverse  G  (D)  of  the  generator 
matrix  G (D),  i.e.,  H (D)  =  G(D).  Since  G(D)  has  all  its  row 
degrees  equal  to  one,  the  corresponding  parity-check  matrix 
H(£>)  must  have  overall  constraint  length  u  =  k.  But  H(D)  is 
also  a  k  x  2k  matrix  whose  row  degrees  ei,e2,  ...,eT  =  e*,  must 
sum  to  k.  Hence  all  dual  row  degrees  e\ , ...,  e*,  =  emoi  =  1, 
so  wo  >  de.  Further,  since  no  e,  is  equal  to  zero,  we  have 
7  =  0  and  de  =  1.  Therefore,  wo  >  1,  and  the  statement  of 
the  proposition  follows.  Q.E.D. 

This  implies  that  block  codes  obtained  from  convolutional 
codes  in  the  class  S  via  the  tail-biting  construnction  are  good 
from  the  weight  distribution  perspective,  as  shown  for  block 
codes  obtained  from  the  Golay  Convolutional  Code  [6]. 
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Abstract  —  The  generalized  Singleton  bound  and 
MDS-convolutional  codes  are  reviewed.  For  each  n,  k 
and  5  an  elementary  construction  of  rate  k/n  MDS 
convolutional  codes  of  degree  6  is  given. 

I.  Introduction 


if  it  has  a  polynomial  encoder  of  the  form 
G{D)  = 


9o(D) 

gAD)  .. 

9n  —  l(D) 

Dgn-\{D) 

go(D)  . . 

■  9n-2(D) 

Dgn-k+i{D) 

Dgn-k+2(D) . . 

■  9n-k{D)  . 

•  (2) 


The  minimum  distance  of  a  block  code  is  upper  bounded  by 
the  Singleton  bound  dmin  <  n  —  k  +  1.  Codes  attaining  this 
bound  are  called  MDS  block  codes  and  Reed  Solomon  codes 
are  examples  of  such  codes.  Since  convolutional  codes  gener¬ 
alize  block  codes,  it  is  natural  to  study  the  way  the  Singleton 
bound  is  generalized  to  convolutional  codes. 

Let  F  be  a  finite  field  and  G(D)  be  a  k  x  n  full  rank  matrix 
over  F[D].  Let  C  =  {u[D)G{D)  \  u(D)  6  F*[D]}  be  the  rate 
k/n  convolutional  code  generated  by  G(D).  Two  generator 
matrices  G(D)  and  G'(D)  are  equivalent  if  they  generate  the 
same  convolutional  code  C.  Then  there  exists  a  k  x  k  unimod- 
ular  matrix  U(D)  with  G'(D)  =  U(D)G(D).  We  say  that 
G(D)  is  catastrophic  if  a  non-polynomial  message  u{D)  can 
result  in  a  polynomial  codeword  u(D)G(D).  This  can  happen 
if  and  only  if  the  k  x  fc-minors  of  the  matrix  G(D)  have  a 
non-constant  common  divisor  other  than  D.  We  will  suppose 
G(D)  is  noncatastrophic. 

Along  with  n  and  k,  there  is  a  third  important  parameter 
of  a  convolutional  code  C,  called  the  degree.  It  is  defined  as 
the  maximal  degree  5  of  the  k  x  k  minors  of  G(D).  Equivalent 
encoding  matrices  have  the  same  degree  so  the  degree  is  an 
invariant  of  the  code.  See  [3]  for  details. 

We  define  the  weight  of  a  polynomial  v(D)  €  F1  [D]  as  the 
sum  of  the  Hamming  weights  of  all  its  IF”  -coefficients  and  the 
free  distance  of  the  code  as: 

d/ree  —  min{wt(w(D))  |  v(D)  €  C,v(D)  /  0}. 

Lemma  1  [3]  Let  C  be  a  convolutional  code  of  rate  k/n  and 
degree  6.  Then  the  free  distance  must  satisfy: 

d free  <  (n  -  k)  ( //eJ  +  1)  4-  5  +  1 .  (1) 

We  call  the  bound  (1)  the  generalized  Singleton  bound.  For 
5  =  0  the  bound  is  the  classical  bound  n  —  k  +  1.  We  showed 
in  [3]  that  there  axe  codes  attaining  this  bound  over  sufficiently 
large  finite  fields.  We  called  such  codes  MDS  convolutional 
codes.  The  existence  proof  in  [3]  was  non-constructive  and  it 
was  based  on  methods  from  algebraic  geometry. 

II.  A  CONSTRUCTION  OF  RATE  Al/n-MDS 

Convolutional  Codes 

In  this  section  we  follow  [5]  and  provide  a  concrete  con¬ 
struction  of  an  MDS  convolutional  code  for  each  degree  5  and 
each  rate  k/n.  The  construction  makes  use  of  [1,  2]. 

As  defined  in  [1,  2],  a  convolutional  code  is  said  to  be  gen¬ 
erated  by  a  polynomial 

g(D)  =  g0(Dn)  +  gi{Dn)D  +  . . .  +  g„-1(Dn)Dn~1 , 

'The  authors  were  supported  in  part  by  NSF  grant  DMS-96- 
10389.  The  first  author  was  also  supported  by  a  fellowship  from  the 
Center  of  Applied  Mathematics  at  the  University  of  Notre  Dame. 


The  code  C  generated  by  G(D)  is  isomorphic  to 

{  (u0(Dn)  +  u,  (Dn)D  +  . . .  +  Uk-i(Dn)Dk~lSj  ■  <?(!>)}, 
where  (uo (D), . . .  ,Uk-i(D))  €  IF*  [Z)j  is  an  information  vector. 

Lemma  2  [5]  Let  p  be  a  prime  and  k  <  n,  5  nonnegative  in¬ 
tegers  with  p  and  n  relatively  prime.  Then  there  exist  positive 
integers  r  and  a  with 

a  >  \b/k\  4-  1  +  5/(n  —  k),  an  =  pT  —  1. 

Assume  that  a,r  is  as  in  the  Lemma  2  and  let  N  —  an, 
K  =  N  —  (n  —  k)  ([5/fcJ  4-  1)  —  5,  and  a  €  Fp»-  a  primitive 
element  of  Fp--.  Define  g(D)  =  (D  —  a°)(D  —  a1)  ■  ■  ■  (D  — 
aN-K- 1)  g  [£)]  The  polynomial  g(D)  defines  an  [N,  K\ 
Reed-Solomon  block  code  with  distance  dg  =  N  —  K  +  1  = 
(n  —  k)  {\&jk\  +  1)  -I-  5  +  1. 

Using  [1,  Theorem  3]  we  obtain: 

Theorem  3  [5]  Let  g{D)  be  defined  as  above.  Then  the  con¬ 
volutional  code  defined  by  (2)  is  MDS. 

Example  4  [5]  Let  a  be  a  primitive  of  F26  .  The  rate  2/3 
encoder 

a28+a35D+a57D2  1  +a6D+a42D2  as+a 26D+D2  ' 

a8D+a26D2+D3  a28+a35D+a57D2  l+a6D4«42D2  ' 

has  degree  5  and  has  free  distance  9.  The  code  attains  the 
generalized  Singleton  bound  (1)  and  therefore  is  an  MDS  con¬ 
volutional  code. 

If  one  is  interested  to  do  the  construction  with  small  fields 
then  one  should  construct  a  prime  power  q  for  which 

n|(9-l)and9>5^^.  (3) 

The  first  author  recently  showed  [4]  that  there  are  alterna¬ 
tive  constructions  for  unit  memory  MDS  convolutional  codes, 
these  are  codes  where  5  <  k. 
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Abstract—  A  universal  lossless  resolution  scalable  (b)  There  is  exactly  one  production  rule  (3)  in  G(M|Q)  with 
progressive  image  code  is  presented  which  is  based  on  R  —  Q  (called  the  root  production  rule  of  G)M\Q)). 

the  concept  of  a  conditional  quadrisection  grammar.  (c)  A  square  array  of  nonterminal  production  rules  can  be 

made  bigger  by  simultaneous  replacement  of  each  entry 
I  Introduction  (3)  with  a  2  x  2  array  of  rules 

Let  A  be  a  finite  alphabet.  For  each  nonnegative  integer  r*  s  r  1  D  T  T  1 

n,  let  Mn  be  the  set  of  all  2n  x  2n  matrices  over  A.  Let  u  ~v  , 

M  =  U„A4„;  M  is  the  class  of  images  that  we  deal  with  here.  [  ]  F— ►  [  ] 


If  n  >  1  and  M  =  [M(i,  j)  :  i,  j  =  0, 1,  •  ■  • ,  2"  -  1]  is  an  image 
in  Mn,  we  let  4 ■  M  =  [M(2i,2j)  :  «,  j  =  0, 1,  -  *  • ,  2n— 1  —  1] 
be  the  image  in  Mn-\  obtained  by  downsampling  M.  For 
each  image  Q  £  M,  let  M(Q)  be  the  set  of  all  images  M  for 
which  4 .  M  =  Q.  For  each  n  >  0  and  each  M  £  Mn,  let 
M°,  M1,  •  •  • ,  M”  be  the  images  such  that  M"  =  M  and 

M'  =  4 .Af<+\  0  <  i  <  n 

A  lossless  resolution  scalable  progressive  image  code  (LR- 
SPIC)  <f>  on  M  consists  of  a  collection  of  binary  words 

<f>  =  {w(a)  :  a  £  A}  U  {w{M\Q)  :  Q  £  M,  M  £  M(Q)}  (1) 

such  that 

(i)  The  words  {u>(a)  :  a  £  A}  satisfy  the  prefix  condition. 

(ii)  For  each  Q  £  M,  the  words  {w(M\Q)  :  M  £  M(Q)} 
satisfy  the  prefix  condition. 

For  each  n  >  0  and  each  M  £  Mn,  the  LRSPIC  <j>  given  by 
(1)  encodes  M  into  the  binary  codeword  w^M)  given  by 

uv(M)  =  w(M°)w(Ml \M°)  ■  ■■w(Mn\Mn-1),  (2) 

the  left-to-right  concatenation  of  the  words  w(M°), 
w{Ml\M°),  ■  ■  ■  ,w(Mn\Mn~1). 

II.  Conditional  Quadrisection 

Let  Q  cind  M  be  images  such  that  M  £  M(Q).  Supposing 
that  Q  is  2-7  x  2J,  we  let  1(Q)  denote  the  distinct  subimages 
of  Q  that  appear  in  the  partitions  of  Q  into  2'  x  2’  subimages, 
0  <  i  <  j.  The  conditional  quadrisection  grammar  G(M\Q) 
[1]  is  a  set  of  production  rules  of  the  form 


where  R  is  a  member  of  T(Q),  B  is  an  abstract  symbol, 
C,  D,  E,  F  are  all  abstract  symbols  if  R  £  Mo  (in  which 
case  (3)  is  said  to  be  nonterminal)  or  are  fill  members  of  A 
if  R  £  Mo  (in  which  case  (3)  is  said  to  be  terminal).  The 
grammar  G(M\Q)  satisfies  the  properties: 

(a)  Given  R  and  B,  there  is  at  most  one  production  rule  in 
G(M\Q)  of  form  (3). 

'Supported  by  NSF  Grants  NCR-9627965  and  CCR-9902081. 
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where 

R-\S  T 

n~[U  V  ' 

Repeated  application  of  this  operation,  starting  from 
the  root  production  rule,  eventually  results  in  a  matrix 
of  terminal  production  rules  which  yields  M. 

Example:  Let  Q  =  ^  and  let  G(M|Q)  be  the  condi¬ 

tional  quadrisection  grammar 

a]’  a^[°i  !]■  a±[\  ;]}•  <4> 

The  reconstruction  method  (c)  yields 

'0101' 

,,  1111 
M=  0111 
.1111. 

Given  Q,  M  can  be  encoded  with  9  codebits,  because  each 
rule  in  (4)  can  be  encoded  with  3  codebits. 

III.  Universality  Results 

To  each  conditional  quadrisection  grammar  G(M\Q)  there 
corresponds  a  binary  codeword  w(M\Q)  such  that  G(M\Q) 
is  recoverable  from  Q  and  w(M\Q).  The  binary  codewords 
{w(M\Q)  :  Q  £  M,  M  £  M(Q)},  augmented  by  any  binary 
codewords  {tu(a)  :  a  £  A}  satisfying  (i),  induce  a  unique  LR¬ 
SPIC  on  M.  Let  L(M)  be  the  length  of  the  codeword  (2) 
assigned  to  M  £  M  by  this  LRSPIC,  and  let  Lfs(M)  be  the 
length  of  the  codeword  assigned  to  M  by  a  fixed  (but  arbi¬ 
trary)  finite-state  LRSPIC  on  M. 

Theorem  1  For  some  positive  constant  C, 

max  {L(M)  -  LU(M)}  <  C  [—1  ,  n  >  1 

AfGA^n  L  n  J 

Corollary  1  Let  [Y(i,  j)  :  i,j  integers]  be  a  stationary  ran¬ 
dom  field  with  entropy  H.  Let  Mn  =  [X(t,  j)  :  i,j  = 
0, 1, ....  2”  —  1].  Then 

lim  E[L{Mn)]l 4n  =  H 

n-foo 
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Abstract  —  A  binary  data  string  of  length  2k  induces 
a  Boolean  function  of  k  variables  which  can  be  repre¬ 
sented  by  a  unique  reduced  binary  decision  diagram. 
We  losslessly  compress  the  data  string  indirectly  by 
compressing  this  binary  decision  diagram.  The  result¬ 
ing  data  compression  algorithm  is  universal. 


The  preceding  discussion  suggests  a  lossless  data  compres¬ 
sion  algorithm  which  encodes  a  binary  data  string  u  of  length 
a  power  of  two  in  two  steps: 

Step  1:  Find  the  ROBDD  representation  of  the  Boolean 
function  induced  by  u. 

Step  2:  Compress  the  ROBDD  representation. 


I.  Introduction 

We  start  with  the  observation  that  a  binary  data  string 
of  length  2*  induces  a  Boolean  function  of  k  variables  in  a 
natural  way.  The  following  example  illustrates  the  procedure. 
Example  1:  A  general  binary  data  string 


U  =  UlU2U3U4U5U6U7UgUgUl0UnUl2Ul3Ul4Ul5Ut6  (l) 


of  length  16  induces  the  following 
fu(x i,xi,X3,X4)  of  four  variables: 


/*(0,  0,0,0)  =  u, 
/u(0,  0,1,1)  =  u4 
/„(0, 1,1,0)  =  u7 

/u(l,  0,  0,  l)  =  uio 
/u(l,  1,  0,  0)  =  1113 

/„(l,l,l,l)  =  tl,6 


/„(0,0,0,1)  =  U2 
/u(0, 1,0,0)  =  u5 

M  o,i,  i,i)  =  u* 

/u(l,  0,  1,  0)  =  Mil 

/u(l,  1,0,  1)  =  Ul4 


Boolean  function 


/u(0, 0,1,0)  =  u3 
/,(0,l,0,l)  =  u6 

/u(l,  0,0,0)  =  Ug 
/u(l,  0,  1,  1)  =  U12 
/«*(!,  1, 1,0)  =  uis 


Notice  that  we  assigned  values  to  fu(xi ,  x2,  i3,  £4)  by  run¬ 
ning  through  the  16  possibilities  for  the  vector  variable 
(xi,X2,xs,X4)  in  lexicographical  order. 

Boolean  functions  are  commonly  represented  by  finite,  bi¬ 
nary,  rooted,  directed,  acyclic,  labelled  graphs  called  binary 
decision  diagrams  (BDD’s)  [1],  The  BDD  with  the  mini¬ 
mal  number  of  vertices  that  represents  a  given  Boolean  func¬ 
tion  is  unique  and  is  called  the  ROBDD  representation  of 
the  Boolean  function.  (ROBDD  stands  for  Reduced  Ordered 
Binary  Decision  Diagram.) 

Example  2:  Taking  the  string  in  (1)  to  be  u  = 
0001000100011110,  the  figure  below  depicts  the  ROBDD  rep¬ 
resentation  of  the  Boolean  function  fu(x  1,2:2,  *3,1:4). 

F 


'Supported  by  NSF  Grants  NCR-9627965  and  CCR-9902081. 
2Supported  by  Canadian  NSERC  Grant  RGPIN203035-98. 


II.  Compression  Details 

The  ROBDD  representing  the  Boolean  function  induced  by  a 
binary  string  of  length  2k  is  reconstructible  from  Jfc  +  1  recur¬ 
sively  generated  strings  Si  ,  S2,  •  •  • ,  S*+i ,  constructed  as  in  the 
following  example. 

Example  3:  The  ROBDD  representation  in  the  figure  is 
coded  into  the  five  strings: 

Si  =  A,  S2=  B2C,  S3  =  BBE, 

S4  =  02D12F,  Ss  =  001110 

Each  first  appearance  of  a  symbol  in  {A,  B,  C,  D,  E,  F}  in  a 
string  Si  (corresponding  to  a  nonterminal  vertex  of  the  BDD) 
produces  two  symbols  in  the  next  string  51+i  (corresponding 
to  the  two  daughter  vertices,  bottom  daughter  vertex  first). 
Powers  are  used  to  indicate  the  presence  of  missing  variables — 
for  example,  in  S2,  the  bottom  daughter  vertex  of  A  is  denoted 
B2  to  indicate  that  there  are  2  —  1  =  1  missing  variables 
between  vertex  A  and  vertex  B.  Each  first  appearance  of 
a  symbol  in  {A,  B,  C,  D,  E,  F,  0, 1}  raised  to  a  power  >  2  in 
a  string  S,  brings  about  an  appearance  of  that  symbol  to  a 
power  one  less  in  the  next  string  5;+i  (e.g.,  B2  in  52  becomes 
B  in  53).  The  decoder  knows  the  first  string  5 1,  and  is  sent 
codebits  by  the  encoder  to  allow  each  5,  to  be  built  from  S,-i . 
(For  complete  encoder/decoder  description,  see  [2].) 

III.  Universality  Result 

For  each  k  >  1,  let  B*  denote  the  set  of  all  nonconstant  binary 
strings  of  length  2k  for  which  the  left  half  of  the  string  is  not 
equal  to  the  right  half  of  the  string.  For  u  a  member  of  U/tB*, 
let  L(u)  denote  the  number  of  codebits  into  which  u  is  encoded 
by  our  ROBDD-based  compression  method.  Let  Lu{u)  be  the 
number  of  codebits  into  which  u  is  encoded  by  any  fixed  (but 
arbitrary)  finite-state  lossless  compression  algorithm. 

Theorem  1  For  any  k  >  1, 

max{L(u)  -  Lfs(u)}  <  C 

where  C  is  a  positive  constant  depending  only  on  the  number 
of  states  of  the  fixed  finite-state  compression  algorithm. 
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Abstract  —  We  present  an  information-theoretic 
framework  for  the  optimization  of  the  order  in  which 
embedded  bit-plane  coders  encode  image  data. 

I.  Summary 

An  embedded  image  coder  generates  a  code-stream  with  the 
property  that  every  prefix  of  the  stream  can  be  decoded  to 
reconstruct  the  original  image  data  with  a  fidelity  approach¬ 
ing  that  of  an  “optimal”  compression  algorithm,  tailored  to 
produce  the  same  code  length  (data  rate)  as  the  prefix.  Em¬ 
bedding  raises  the  problem  of  ordering  information  according 
to  its  “value” :  since  the  code-stream  can  be  truncated  at  any 
time,  we  wish  to  transmit  the  most  valuable  information  (in 
the  sense  of  reducing  the  distortion  of  the  reconstructed  im¬ 
age  the  most)  as  early  as  possible.  This  ordering  constraint, 
in  turn,  can  affect  causality  relations  that  are  usually  relied 
upon  to  optimize  the  sequential  probability  assignment  used 
for  coding.  Bit-plane  coding  is  a  simple  and  natural  embed¬ 
ded  coding  technique  that  sequentially  encodes  the  bits  in  the 
binary  representation  of  the  coefficients  produced  by  a  linear 
(e.g.,  wavelet)  transformation  of  the  image  data.  Until  re¬ 
cently,  the  importance  of  the  ordering  problem  had  not  been 
appreciated,  and  most  early  schemes,  especially  those  based 
on  context  modeling  and  arithmetic  coding,  simply  encode 
bit-planes  in  order  of  decreasing  significance  and  according  to 
a  fixed  scanning  pattern  within  bit- planes.  A  more  principled 
approach  to  the  ordering  problem  was  proposed  in  [1,  2,  3]. 

In  this  work,  we  formulate  a  fairly  general  framework  for 
the  bit-plane  technique,  the  embedding  problem,  and  the  de¬ 
sired  characteristics  of  the  solution.  A  generalized  notion  of 
bit-plane  coding  is  formalized  as  a  sequence  of  steps,  each 
step  culminating  with  the  encoding  of  either  a  ternary  signif¬ 
icance  event  (whether  or  not  a  coefficient  becomes  non-zero 
at  a  certain  precision  level,  and  possibly  its  sign),  or  a  binary 
refinement  event  (an  additional  precision  bit  for  an  already 
significant  coefficient).  We  index  coefficients  linearly  as  a  se¬ 
quence  xn  —  X1X2  ■  ■  ■  xn,  and  denote  by  Q\m^  the  value  of  X{ 
quantized  by  a  dead-zone  quantizer  with  step  size  2_lmo+m)  A, 
where  m  >  0  is  the  precision  level  of  the  integer  mo  sat¬ 
isfies  1  >  2m°  maxi  |rj|/A  >  1/2,  and  A  >  0.  Thus,  the 
m-th  bit-plane  is  given  by  the  values  of  Ql771'1  conditioned  on 
\  1  <i<n.  We  denote  by  Qj  the  information  encoded 
up  through  and  including  step  j ,  by  Ij  the  index  of  the  coeffi¬ 
cient  whose  quantized  representation  is  updated  at  step  j,  and 
by  Mj  the  new  level  of  precision  attained  on  that  coefficient 
with  the  update.  Finally,  we  seek  functions  {/,(•)}  such  that 
Ij  =  fj(Qj-i),  and  probability  assignments  Pj(Q\™^\Qj-i), 
used  to  encode  the  events.  A  sequence  of  pairs  {{fj,Pj)}  char¬ 
acterizes  a  generalized  bit-plane  coding  scheme. 

The  selection  of  {( fj,Pj )}  should  strive  to  minimize,  in 
some  sense,  a  distortion  measure  D(R)  over  as  wide  a  range 

1Work  done  while  the  author  was  at  HP  Labs. 


of  rates  R  as  possible,  and  for  as  many  images  as  possible.  The 
resulting  global  optimization  problem  appears  intractable  at 
present,  which  has  lead  to  more  localized,  “greedy”  heuristic 
approaches.  In  [2],  the  following  embedding  principle  is  de¬ 
fined:  Select  Ij  so  as  to  maximize  E[Dj-\  —  Dj  |  Qj-\]/E[Rj  — 
Rj-i  |  Qj-i],  the  expected  distortion  reduction  per  expected 
bit  of  description.  Here,  Dj  and  Rj  denote,  respectively,  the 
distortion  and  total  code  length  after  step  j,  and  expectation 
is  taken  with  respect  to  some  model  for  xn .  The  separation 
of  significance  and  refinement  decisions  in  EZW  [4]  roughly 
conforms  to  this  principle. 

Although  the  embedding  principle  can  still  be  computa¬ 
tionally  demanding  [3],  it  can  also  be  applied  to  derive  an 
intra-bit-plane  ordering  of  decisions.  In  this  framework,  we 
generalize  results  in  [2]  to  show  that  for  a  broad  class  of  con¬ 
ditional  distributions  on  {xn},  the  principle  dictates  the  en¬ 
coding  of  significance  decisions  in  decreasing  order  of  their 
likelihood  of  being  non-zero.  This  result  follows  from  Propo¬ 
sition  1  below,  where  subscripts  denote  the  underlying  den¬ 
sities,  Pm,fk  is  the  conditional  probability  that  \Q\m+  ‘  ,  =  1, 
and  Dmjk( 0)  (resp.  Rm,fk( 0))  denotes  the  decrement  (resp. 
increment)  in  distortion  (resp.  ideal  code  length)  when  encod¬ 
ing  Q\m+ ^  conditioned  on  Q ^  =  0. 

Proposition  1  Given  two  symmetric  densities  f \  and  f 2 
having  the  property  that  fi(y)/ fi(x)  <  /2 (y) / /2 (m)  for  all 
0  <  x  <  y,  then,  the  following  hold  for  all  m,  i  and  A: 

1-  Pm,fi  <  Pm,f2‘ 

2.  Efl  [  Xi  |  Q -m)  =  1  ]  <  E/2  {xi\Q\m)  =  1]. 

3.  DmJl  (0 )/Rmj1  (0)  <  Dmj2  (0)/ Rmj2  (0) • 

Proposition  1  covers  zero-mean  Laplacian  and,  more  broadly, 
generalized  Gaussian  densities.  These  families  are  often  used 
to  model  wavelet  transform  coefficients. 

The  established  equivalence  leads  to  an  effective  implemen¬ 
tation  of  the  embedding  principle  as  shown  in  [2],  where  con¬ 
text  modeling  for  ordering  and  coding  are  combined.  These 
ideas  and  techniques  have  been  incorporated  into  the  algo¬ 
rithm  at  the  core  of  the  emerging  image  compression  stan¬ 
dard  JPEG 2000.  The  formal  framework  presented  also  pro¬ 
vides  new  insight  into  the  effectiveness  of  other  established 
practical  algorithms  like  SPIHT  [5]. 
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I.  Introduction  and  Algorithm 

Recently  proposed  in  [1],  the  MPM  (Multilevel  Pattern 
Matching)  grammar  transform  underlies  a  lossless  data  com¬ 
pression  algorithm  developed  in  [1],  In  this  paper,  we  ex¬ 
tend  the  MPM  grammar  transform  to  the  case  of  side  in¬ 
formation  known  to  both  the  encoder  and  decoder,  yielding 
a  conditional  MPM  grammar  transform  which  is  referred  to 
as  the  CMPM(r,  7)  transform  throughout  this  paper.  Based 
on  the  CMPM(r,  7)  transform,  we  develop  a  universal  lossless 
data  compression  algorithm  with  side  information  called  the 
CMPM  algorithm,  which  has  linear  time  and  storage  complex¬ 
ity  and  asymptotically  achieves  the  conditional  entropy  rate 
of  any  stationary,  ergodic  source  pair.  The  advantage  of  using 
side  information,  if  any,  for  data  compression  is  obvious;  one 
can  considerably  reduce  the  compression  rate  if  the  side  infor¬ 
mation  is  highly  correlated  with  a  sequence  to  be  compressed. 

Let  A"  denote  the  set  of  all  sequences  of  length  n  drawn 
from  an  finite  alphabet  A,  and  let  xn  =  x\ . . .  x„  €  An.  In 
the  CMPM  algorithm,  xn  is  compressed  indirectly  via  the 
CMPM(r,  7)  transform  (for  some  positive  integer  parameters 
r  and  7)  followed  by  conditional  arithmetic  coding.  The  input 
to  the  transform  is  a  sequence  of  pairs  (*j), . . .  ,  (*n)  from  a 
joint  alphabet  A  x  Ay,  where  the  sequence  yn,  drawn  from  the 
finite  alphabet  Ay ,  is  regarded  as  side  information  and  known 
to  both  the  encoder  and  decoder.  The  transform  output  is  a 
multilevel  structure  called  a  CMPM  grammar,  in  which  each 
level  *  is  represented  by  a  pair  of  sequences  */’)  A  v[1' . . . 

and  t ^  ..  The  sequence  v ^  is  then  encoded 

conditionally  on  t ^  by  a  zero-order  arithmetic  encoder  for 
*  =  7,7  —  1,...  ,0. 

II.  CMPM(r,  7)  Grammar  Transform 

To  simplify  the  description  of  the  transform,  we  assume  that  n 
is  a  multiple  of  r1 .  Then,  the  CMPM(r,  7)  transform  generates 
the  levels  7  through  2  by  repeating  the  following  three  steps 
for  each  level  *: 

SI:  (*  =  7)  Partition  xn  into  blocks  of  A-symbols  of  length 
r1 .  Denote  these  blocks  by  variables  v^, . , .  ,  v'^r, 
and  the  resulting  sequence  v[^ . . .  v^JrI  by  Analo¬ 
gously,  partition  yn  into  blocks  of  Ay-symbols  of  length 
rr ,  and  denote  these  blocks  by  variables  t[!\  . . .  ,  t^jr, , 

and  the  resulting  sequence  by  For 

brevity,  we  will  call  a  block  of  A-symbols  an  “A-block  “ 
and  a  block  of  Ay-symbols  an  “Ay-block  “. 

‘This  work  was  supported  in  part  by  NSERC  of  Canada  under 
Grant  RGPIN203035-98,  by  CITO,  by  the  Premier’s  Research  Ex¬ 
cellence  Awards  of  Ontario,  and  by  NSF  under  Grant  CCR-9902081. 
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SI:  (*  <  7)  For  every  j  such  that  uj'+1)  =  s,  partition  the  A- 
block  and  the  Ay-block  into  r  sub-blocks 

of  length  r‘,  yielding  a  sequence  u(,)  of  A-blocks  and  a 
sequence  <*’)  of  Ay-blocks. 

S2:  Visit  every  Ay-block  in  the  sequence  fb)  from  left  to 
right,  and  label  all  identical  Ay-blocks  with  the  same 
integers  and  all  distinct  Av  -blocks  with  distinct  inte¬ 
gers  in  increasing  order,  starting  with  1.  Denote  each 
label,  or  a  y-token,  corresponding  to  an  Ay-block  fW 
by  tj'\  For  every  distinct  {/-token  7,  let  denote 
the  subsequence  =7}.  We  call  this  sub¬ 

sequence  a  conditional  subsequence  of  v since  tjb)|.y 
can  be  regarded  as  the  sequence  v ^  conditioned  on  the 
y-token  7.  All  conditional  subsequences  of  t)b)  are  pro¬ 
cessed  independently  from  each  other  in  step  S3. 

S3:  For  each  distinct  y-token  7,  visit  every  A-block  in  the 
conditional  subsequence  fib)|.y  from  left  to  right  and  la¬ 
bel  the  first  appearance  of  each  distinct  A-block  a  in 
this  subsequence  by  a  special  symbol  ’s’.  If  the  same 
A-block  a  appears  in  v^\y  again,  label  it  by  an  integer 
so  that  all  identical  A-blocks  a  in  fi^|7,  except  for  the 
most  left  one,  will  be  labeled  by  the  same  integer,  which 
is  just  the  number  of  distinct  A-blocks  in  (7  up  to 
the  first  appearance  of  the  A-block  a  inclusive.  We  use 
variable  vj  to  denote  the  label  of  A-block  vf‘*  in  fib)  . 
For  level  1,  we  perform  only  step  SI,  and  instead  of  perform¬ 
ing  steps  S2  and  S3,  we  let  t/°)  and  t ^°)  be  fi^°)  and  t ^°) 
respectively. 

III.  Optimality  Results 

Let  r«mPm(zn|!/n)  be  the  compression  rate  in  bits  per  letter 
resulting  from  using  our  CMPM  algorithm  to  encode  xn  given 
yn.  Let  rj(a:n|yn)  be  the  smallest  compression  rate  among 
all  conditional  arithmetic  coding  algorithms  with  k  contexts 
which  condition  on  yn  and  operate  letter  by  letter.  Then, 
based  on  the  framework  of  grammar-based  codes[2],  we  have 
established  the  following  optimality  results: 

Theorem  1.  [rcmpm  (*"|y")  -  r;(z"|y")]  =  O  (^) 

Corollary  1.  For  any  stationary,  ergodic  source  pair  XY  = 
with  alphabet  A  x  Av,  rcmpm(xn|yn)  converges 

to  J7oo(A|Y)  ^  lim  (A/7(Xi . Xm\Yi . Ym))  with 

probability  one  as  n  —¥  00. 
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Abstract  —  In  this  paper,  we  present  a  construction 
for  binary  sequences  (s(<)}  of  period  N  =  pm  -  1  for  an 
odd  prime  p  based  on  the  polynomial  (z  +  l)d  +  azd  +  b 
with  optimal  three-level  autocorrelation. 

I.  Construction  of  New  Binary  Sequences 

Recently,  there  has  been  a  big  progress  in  constructing  bal¬ 
anced  binary  sequences  of  period  2m  —  1  with  ideal  autocor¬ 
relation  [1,  2,  4].  The  idea  of  the  (new)  construction  is  to  use 
a  special  polynomial  over  finite  fields.  In  this  paper,  we  gen¬ 
eralize  it  to  generate  binary  sequences  of  period  pm  —  1  with 
optimal  autocorrelation  for  any  prime  p  and  an  integer  m. 

Let  F  denote  the  field  of  pm  elements  and  F*  =  F\{0}. 
For  a,b  £  F  and  a  positive  integer  d,  consider  the  subset  of 
F*  given  by 

/(a,  b)  =  {x\x  =  {z  +  l)d  +  azd  +  b,  z  £  F  }\{0}. 

The  characteristic  sequence  {sa,b(<)}  of  the  set  I  (a,  6)  in  F'  is 
defined  by  sa,b(<)  =  1  if  a'  6  /(a,  b)  and  sa,b(<)  =  0  otherwise, 
where  a  is  a  primitive  element  of  F. 

Proposition  1  Let  p  >  3,  d  =  2,  and  a,  b  £  F  with  a  + 1  ^  0. 
Then,  {sa,b(<)}  is  a  cyclic  shift  of  the  characteristic  sequence 
of  the  polynomial  z 2  —  c,  where  c  £  F  depends  on  b. 

By  virtue  of  the  above  proposition,  we  may  define,  for  short 
notation, 

Jc  =  {  x  |  x  =  z2  -  c,  z  £  F  }\{0},  (1) 

7*  =  {  x  |  x  =  z2  -  c,  z  €  F*  }\{0},  (2) 

(sc(<)}  ({«*(<)})  to  be  its  characteristic  sequence  in  F*  of 
period  N  =  pm  —  1,  and  0c(t)  (0*(r)  )  its  periodic  autocorre¬ 
lation  function.  There  are  two  more  cases  in  which  {sa,b(f)} 
becomes  a  cyclic  shift  of  {sc(<)}  for  some  c  £  F. 

Proposition  2  Let  p  >  5,  d  =  3,  and  a  =  —  1.  For  any 
positive  integer  m  and  any  b  £  F,  (sa,b(t)}  M  a  cyclic  shift  of 
{sc(t)},  where  c  £  F  depends  on  b. 

Proposition  3  Let  p=3,  d=4,  a=l,  and  m  is  odd  so 
that  N  =  3m  —  1  =2  (mod  4).  For  any  b  £  F,  {sa,b(f)}  m  a 
cyclic  shift  of  {sc(f)},  where  c  £  F  depends  on  b. 


Theorem  4  Let  {sc(t)}  and  {«*(<)}  be  the  characteristic  se¬ 
quences  of  Ic  and  /*,  respectively,  of  period  N  =  pm  —  1, 
and  a  be  a  primitive  element  of  F.  Then,  both  {i„(<)}  and 
{si(t)}  are  balanced,  and  both  {sa(t)}  and  {«*(*)}  are  almost 
balanced.  Furthermore,  we  have  (i)  Sa(t)  =  Si(t  —  1)  +  1  for 
all  t;  (ii)  sa{t)  =  s*(t  -  1)  +  1  for  all  t;  (Hi)  sa(N/ 2  +  1)  = 
Si(N/2)  =  1  and  si(t)  =  sa(t  +  1)  +  1  for  all  t  /  N/2;  and 
(iv)  s’a{N/ 2)  =  s*(iV/2  -  1)  =  0  and  sl(t)  =  s*a{t  +  1)  +  1  for 
all  t  N/2  —  1. 

Theorem  5  The  sequences  {Sa(f)}  and  {«i(f)}  of  period  N 
are  balanced  and  have  optimal  autocorrelation.  Specifically, 
for  r  0  (mod  N) 


=  f  ~4t’ 

\  2  —  4e, 


if  N  =  0  (mod  4) 
if  N  =  2  (mod  4) 


where  e  £  {0, 1}. 


Theorem  6  The  sequences  {s*(t)}  and  {sa(<)}  of  period  N 
are  almost  balanced  and  have  optimal  autocorrelation.  Specif¬ 
ically,  for  t  £  0  (mod  N), 


—4c,  if  N  =  0  (mod  4) 

2  —  4c,  if  N  =  2  (mod  4) 


where  e  £  {0, 1}. 
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Abstract  —  It  is  well-known  [1]  that  a  balanced  bi¬ 
nary  sequence  {a^}  of  period  2n  —  1  with  two- level 
autocorrelation  is  constant  on  cyclotomic  cosets,  i.e. 
{«2fc}  =  {dfc+r}  for  all  k  and  some  fixed  value  of  r. 
Moreover,  there  is  a  cyclic  shift  of  the  original  se¬ 
quence  for  which  r  =  0.  Such  two-level  autocor¬ 
relation  sequences  are  in  one-to-one  correspondence 
with  cyclic  Hadamard  difference  sets  with  parame¬ 
ters  (2n  -  l^"-1  -  1, 2n_2  -  1).  Perhaps  best  known 
among  such  sequences  are  the  m-sequences,  which  cor¬ 
respond  to  Singer  difference  sets.  For  any  primitive 
element  a  in  GF( 2"),  the  set  of  m-sequences  is  given 
by  Sq  =  {Tr(a?,c)},  (<j,2n  —  1)  =  1,  where  Sq  and  Sq >  are 
distinct  m-sequences  iff  q  and  q'  belong  to  different 
cyclotomic  cosets. 

If  B  =  {6fc }  is  any  binary  sequence  of  period  2n  —  1 
which  is  constant  on  cyclotomic  cosets,  then  B  can 
be  written  as  a  sum  (term-by-term,  modulo  2)  of  se¬ 
quences  of  the  form  (Tr(a,fc)},  where  q  need  not  be  co¬ 
prime  to  2n  —  1.  That  is,  the  linear  feedback  sequences 
of  all  periods  which  divide  2"— 1  form  a  basis  for  the  set 
of  sequences  which  are  constant  on  cyclotomic  cosets. 
We  conjecture  (based  on  numerical  evidence)  that  for 
two-level  autocorrelation  sequences,  only  values  of  q 
which  belong  to  cyclotomic  cosets  of  size  n  are  in¬ 
volved  in  this  basis  representation.  However,  not  all 
the  component  sequences  in  this  representation  need 
to  be  m-sequences. 

It  has  recently  been  shown  [2]  that  when  n  is  odd, 
all  the  known  cases  of  two-level  autocorrelation  se¬ 
quences  of  period  2"  —  1  have  the  same  Hadamard 
transform  as  one  of  the  m-sequences.  A  similar  result 
holds  for  even  n,  but  instead  of  an  m-sequence,  only 
a  linear  feedback  sequence  appears. 

Using  the  inverse  Hadamard  transform,  and  start¬ 
ing  with  a  single  m-sequence  (when  n  is  odd),  we 
can  obtain  all  the  known  two-level  autocorrelation  se¬ 
quences  of  period  2"  —  1  which  have  no  subfield  fac¬ 
torization.  (Here  we  say  that  the  binary  sequence 
B  =  {fcfc}  where  6*  =  f{ak)  and  f{x)  =  ^2qTr(xq)  has  a 
subfield  factorization  if  there  is  m,  a  proper  factor  of 
n,  such  that  f(x)  can  be  decomposed  into  a  composi¬ 
tion  of  a  function  from  GF{ 2m)  to  GF{ 2)  and  the  trace 
function  from  GF( 2”)  to  GF( 2m).)  We  have  verified 
this  for  odd  n  <  19.  Interestingly,  no  previously  un¬ 
known  examples  were  found  by  this  inverse  Hadamard 
transform  process  for  any  odd  n  <  19.  This  is  support¬ 
ing  evidence  (albeit  weak)  for  the  conjecture  that  all 
families  of  cyclic  Hadamard  difference  sets  of  period 
2n  —  1  having  no  subfield  factorization  are  now  known, 
at  least  for  odd  n. 
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We  will  continue  to  investigate  odd  values  of  n,  and 
to  look  for  analogous  results  with  n  even. 
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Abstract  —  Crosscorrelation  functions  Cd{t)  of  m- 
sequences  over  finite  field  GF(p)  and  their  decimation 
sequences  are  investigated  in  this  paper. 

I.  Introduction 

Maximal  length  linear  shift-register  sequences  (or  called  m- 
sequences)  have  desirable  autocorrelation  functions,  thus  they 
have  been  widely  used,  e.g.  in  crytography  and  communica¬ 
tions.  However,  the  problems  about  the  crosscorrelation  of 
m-sequences  and  their  decimation  sequences  still  keep  open, 
although  much  research  has  been  done. 

Let  p  be  a  prime,  n  a  positive  integer,  q  =  pn,  and  GF (q) 
the  finite  field  with  q  elements,  Tr  denote  the  trace  func¬ 
tion  from  GF(g)  to  GF(p),and  a  be  a  primitive  element  in 
GF (q),  then  the  sequence  (a;  =  Tr(7.a‘))i  is  an  m-sequence 
over  GF(p)  (  [2,  4]),  wherey  =  aT1  is  an  element  of  GF(<j). 
The  sequence  (6*  =  an  —  Tr^a'1’’)),  is  called  a  decimation 
sequence  of  (ai)i  with  the  decimation  factor  d. 

Let  (ai)i  and  {bj)j  be  periodic  sequences  over  GF(p)  with 
period  l,  and  £  =  e27rl/,p  be  the  primitive  complex  pth  root 
of  unity,  the  crosscorrelation  function  of  (oi)i  and  (bj)j  is 
defined  by 

i-i  i-i 

Cab{t)  =^/Ci-t¥:  =  ^£ai-£_i\0  <t<l-  1.  (1) 

i=0  i=0 

In  particular,  when  (a.j)j  is  identical  to  Cab{t)  is  the  au¬ 

tocorrelation  function  of  (ai)i.  In  this  paper,  we  consider 
only  the  case  that  (a;),  is  an  m-sequence  over  GF(p)  and  (bj)j 
is  its  decimation  sequence  with  decimation  factor  d,  so  (1)  can 
be  simplified  as 


Cab(t) 


Pn-2 

Y  £01-t“6i 

i=0 

p"-2 

i— 0 

y  ^Tr('y;r-id) 

a:6GF(pn)* 

xeGP(p") 

Cd(t) 
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For  convenience,  we  consider 

1  +  Cd(t)  =  Y  C{x^xd)  (2) 

i6GP(p") 


Muller  [1]  studied  the  upper  bound  of  |l  +  Cd(t)|  for  decima¬ 
tion  factor  d  =  -I-  n  odd  and  p  =  3,  and  proposed 

an  open  problem:  what  is  the  upper  bound  of  |1  +  Cd(t)|  when 
p  >  3?  In  this  paper,  we  have  solved  this  open  problem  and 
get  more  results. 


II.  Main  results 

Eg+: 
p+i 

1  -I-  P 


(1)  If  the  decimation  factor  d  =  +  v'\  1 

n  odd,  then 


p  =  3(mod4)  , 


|l  +  Cd(f)|  < 


Vp*- 


Therefore  the  problem  proposed  by  Muller  [1]  is  solved. 

(2)  If  the  decimation  factor  d  =  n  odd,  p  =  3(mod4), 

then 

Cd{t)  e  {-1,  -1  +  Vp”^.  -i  -  Vp^}- 

(3) Under  the  condition  of  (l),we  have 


P(\l  +  Cd(t) 

P(\l  +  Cd(t)  |=#)>1-^. 

(4)Under  the  condition  of  (2), we  get  the  result: 


P\l  +  Cd{t)  |=  JpVP*)<^, 
P(1  4-  Cd(t)  =  0)  >  1  — 


III.  Conclusions 

In  this  paper,  we  have  studied  in  detail  the  crosscorrela¬ 
tion  functions  of  m-sequences  and  their  decimation  sequences 
in  two  different  cases.  This  paper  generalizes  the  conclusion  of 
Muller  in  [1],  and  therefore  solved  an  open  problem  proposed 
by  Muller  in  [1].  In  addition,  we  have  investigated  the  distri¬ 
bution  of  the  value  of  crosscorrelation  functions,  and  get  the 
result  that  when  p  is  large  enough,  the  probability  of  the  cross- 
correlation  function  achieving  the  maximal  absolute  value  is 
very  small. 

References 

[1]  Eva  Nuria  Muller,  On  the  Cross  correlation  of  Sequences  Over 
GF  (p)  with  Short  Periods,  IEEE  Transaction  on  Information 
Theory,  Vol.45,  NO.l,  pp.289-295,  JANUARY  1999. 

[2]  R.Lidl  and  H.Niederreiter,  Finite  Fields,  Vol.20  of  Encyclopedia 
of  Mathematics  and  its  Applications.  Reading,  MA:  Addison- 
wesley,  1980. 

[3]  Luogeng  Hua,  the  introduction  of  number  theory,  Science  press, 
Beijing,  1957. 

[4]  Zhexian  Wan,  algebra  and  coding  theory ,  Science  press,  Beijing, 
1980. 


0-7803-5857-0/00/51  0.00  ©2000  IEEE. 


301 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Constabent  Properties  of  Golay-Davis-Jedwab  Sequences 

M.G.Parker1 

Code  Theory  Group, Institutt  for 
Informatikk,  University  of 
Bergen, N-5020  Bergen,  Norway 
e-mail:  matthewfflii.uib.no 


Abstract  —  We  conjecture  that  length  2f  bipolar 
sequences  with  optimal  or  near-optimal  Hadamard 
and  Negahadamard  Peak  Factors  are  exactly  the  set 
of  Golay  Complementary  sequences,  as  formed  using 
the  Davis-Jedwab  construction.  It  appears  Golay  se¬ 
quences  are  both  Bent  and  Negabent  for  lengths  2* 
where  t  is  even  and  t  ^  2  mod  3.  We  also  conjecture 
this  sequence  family  has  near-maximum  distance  from 
all  constaaffine  functions. 

I.  Introduction 

The  sum  of  aperiodic  autocorrelations  of  Golay  sequence  pairs 
is  a  8  pulse  [2],  [1,  4]  describe  a  construction  for  length  2e  Go- 
lay  sequences  (Golay-Davis-Jedwab  construction  (GDJ))  that 
probably  covers  all  Golay  sequences  of  length  2t .  We  de¬ 
fine  Hadamard,  Negahadamard  and  Constahadamard  Trans¬ 
forms  (HT,  NHT  and  CHT),  these  being  multidimensional 
Cyclic,  Negacyclic  and  Constacyclic  Discrete  Fourier  Trans¬ 
forms  (DFT).  Negabent  and  Constabent  sequences  are  se¬ 
quences  whose  NHTs  and  CHTs,  respectively,  have  completely 
flat  power  profile.  Extensive  computation  suggests  that  bipo¬ 
lar  GDJ  sequences  always  have  flat  or  near-flat  HTs,  NHTs 
and  CHTs.  It  is  conjectured  that  these  sequences  sire  the 
unique  intersection  of  the  set  of  bipolar  sequences  with  Bent 
or  known  near-Bent  properties  with  those  with  NegaBent  or 
known  near-NegaBent  properties.  It  is  known  that  GDJ  se¬ 
quences  are  Bent  for  length  2* ,  t  even,  [3],  but  the  near-Bent 
property  for  length  2*,  t  odd,  and  the  Negabent  and  near- 
Negabent  properties  are  new  results.  It  is  conjectured  that 
bipolar  GDJ  sequences  are  both  Bent  and  Negabent  for  a 
specified  infinite  set  of  lengths  and  therefore  their  associated 
boolean  functions  have  maximum  distance  from  affine  and  ne- 
gaaffine  functions.  Further  computations  suggest  they  have 
near-maximum  distance  from  all  constaaffine  functions  in  all 
cases.  This  may  be  desirable  for  cryptographic  applications. 

II.  The  Constahadamard  Transform 
The  Walsh-Hadamard  Transform  (HT),  Ht,  is  constructed 
from  the  direct  product  of  2-point  DFT  matrices,  Ht  = 
hx  ®  Hi  ®  hx  ®  . . .  =  where  Hi  =  (  }  )  and  ®  is 

the  direct  product.  The  Negahadamard  Transform  (NHT), 
NHt,  is  the  direct  product  of  2-point  Discrete  Negacyclic 
Fourier  Transform  matrices,  nh,  =  nhj  ®  nh,  ®  nhj  ®  , . .  = 
®*_1nh1  where  NHi  =  (  }  -*  )’  anc*  ^  =  —  1-  The 

Constahadamard  Transform  (CHT),  CnjHt,  is  the  tth  di¬ 
rect  product  of  2-point  index  j  Discrete  Constacyclic  Fourier 
Transform  (DCFT)  matrices  over  nth  complex  roots  where 
2|n,  Cn  jHt  =  cn  jHx  ®  c„  jHx  ®  c„  jHj  ®  . . .  =  8‘^CnjHi  where 

CnjHi  =  ^  |  J+%  j,  a  =  e~^,  j  is  one  of  the  inte¬ 

gers  in  Zn  mutually  prime  to  n  and  less  than  and  <j>  is  Eu¬ 
ler’s  Totient  Function,  e.g.,  Ht  =  C2,iHt,  NHt  =  C-jiHt, 

■'This  work  was  funded  by  NFR  Project  Number  119390/431 


and,  Ci2,bHi  =  ^  j  “u  ) .  where  «  =  e  W . 
Constahadamard  Peak  Factor:  Let  A  =  CnjHta  — 
(Ao,Ai,..., A2t-i)T  for  some  n,j.  The  Constahadamard 
Peak  Factor  of  a  is  CHPF(a)  =  2_tmax{AjA*|0  <  i  < 
2*}.  All  CHT  matrices  obey  Parseval’s  Theorem.  1.0  < 
CHPF(a)  <  2‘  Vn,j  if  a  is  unimodular.  A  unimodular  se¬ 
quence  is  Bent  if  it  has  Hadamard  Peak  Factor  (HPF)  of  1.0, 
Negabent  if  it  has  Negahadamard  Peak  Factor  (NHPF)  of  1.0, 
and  ConstaBent  if  it  has  CHPF  of  1.0. 

III.  CHPF  Properties  of  GDJ  Sequences 
GDJ  Sequences  are  detailed  in  [1,  4].  They  are  certain  second 
order  cosets  of  Reed  Muller  (1,  t)  which  are  length  2‘  Golay 
Complementary  Sequences.  Bipolar  GDJ  sequences  are  bent 
for  even  t  [3] .  From  computational  results  we  state, 
Conjecture  1:  The  HPF  of  a  bipolar  GDJ  sequence  is  1.0 
for  even  t  and  2.0  for  odd  t. 

Conjecture  2:  The  NHPF  of  a  bipolar  GDJ  sequence  is  1.0 
for  J  yl  2  mod  3  and  2.0  for  t  =  2  mod  3. 

Conjecture  3:  Bipolar  GDJ  sequences  of  length  2‘  are  both 
Bent  and  Negabent  for  even  t,  t  ^  2  mod  3. 

Conjecture  4:  Let  F  be  the  set  of  length  2*  bipolar  sequences 
with  HPF  =1.0  and  2.0  for  t  even  and  odd,  respectively.  Let 
G  be  the  set  of  length  2*  bipolar  sequences  with  NHPF  =1.0 
and  2.0  for  t  /  2  mod  3  and  t  =  2  mod  3,  respectively.  The 
set  of  GDJ  bipolar  sequences  is  exactly  F  n  G. 

Conjecture  5:  The  CHPF  of  GDJ  bipolar  sequences  is 

always  <  2.00,  Vn,f,j. 

Conjecture  3  follows  from  Conjectures  1  and  2.  Conjecture  4 
may  not  hold  for  t  large.  Conjecture  5  implies  GDJ  boolean 
functions  have  near-maximum  distance  from  all  constaaffine 
functions. 

IV.  Conclusion 

Bipolar  Golay-Davis-Jedwab  (GDJ)  sequences  appear  not 
only  to  possess  low  one-dimensiohal  peak  factors  <  2.0,  but 
also  possess  low  multi-dimensional  peak  factors  <  2.0.  We 
conjecture  these  sequences  are  Bent  or  near-Bent  and  Ne- 
gaBent  or  near-NegaBent.  They  appear  to  be  Bent  and  Ne¬ 
gabent  for  lengths  2\  t  —  0  or  4,  mod  6. 
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Abstract  —  In  this  paper  we  consider  the  outage 
probabilities  of  three  multiuser  scheduling/ power  con¬ 
trol  algorithm:  TDMA,  the  K&H  algorithm  which 
achieves  maximal  Shannon  capacity  [1],  and  TD-KH 
-  a  combination  of  the  former  two  [2].  For  flat  block 
fading  channel,  the  outage  probability  of  these  algo¬ 
rithms  can  be  asymptotically  modeled  as  a  reward 
renewal  process.  Employing  Large  deviations  analy¬ 
sis,  TDMA  and  TD-KH  are  shown  to  be  superior  over 
K&H  under  outage  probability  criteria. 

I.  Introduction 

The  maximal  Shannon  capacity  for  single  user  in  flat  fading 
channel  (channel  state  information  assumed  at  the  transmit¬ 
ter)  is  achieved  by  the  water  pouring  solution  for  each  channel 
use  [3].  The  K&H  [1]  power  control  expands  it  for  multiple 
users  i.e.,  only  the  user  with  the  best  fading  should  transmit. 
The  K&H  power  control  has  a  significant  drawback:  Assum¬ 
ing  flat  block  fading  channel,  an  arbitrary  user  might  wait  for 
a  long  time  until  it  reaches  temporal  maximal  fading,  allowing 
him  to  transmit.  Thus,  when  delay  criteria  are  imposed,  the 
K&H  power  control  is  not  always  suitable.  Note  the  contra¬ 
dicting  approach  of  K&H  when  compared  to  TDMA  where  the 
time  interval  between  successive  transmissions  of  each  user  is 
fixed. 

The  TD-KH  algorithm  can  be  regarded  as  a  compromise 
between  the  above  contradicting  approaches.  Its  Shannon  ca¬ 
pacity  is  below  that  of  K&H  and  above  TDMA  capacity  and 
by  setting  a  user  controlled  parameter  can  achieve  any  value 
between  the  two  [2].  In  this  article  we  show  that  TDMA  and 
TD-KH  algorithms  are  advantageous  over  K&H  when  outage 
capacity  is  concerned.  Exact  analytic  calculation  seems  in¬ 
tractable,  we  adhere  therefore  to  asymptotic  calculation  for 
large  number  of  slots  T  —¥  oo.  The  achieved  result  is  applica¬ 
ble  for  general  renewal  process  defined  over  discrete  time. 

II.  Channel  model 

For  the  sake  of  clarity,  we  shortly  repeat  the  description 
of  the  TD-KH  algorithm.  Consider  a  multiple-user  flat  block 
fading  Gaussian  channel  [3]  where  each  of  the  N  users  has 
the  same  independent  fading  statistics  and  the  same  average 
power.  Setting  a  user  chosen  parameter  L  >  N— 1  the  schedul¬ 
ing  policy  of  TD-KH  is  as  follows:  Inspect  the  previous  L  slots 
before  the  next  one.  If  each  of  the  N  users  has  transmitted  at 
least  once  in  one  of  the  L  slots  then  let  the  user  with  the  best 
fading  transmit  by  the  K&H  power  control.  Otherwise,  the 
user  who  has  not  transmitted  in  the  last  L  slots  (there  is  only 
one  possible  user)  transmits  using  fixed  power  as  in  TDMA. 
Note  that  for  L  =  N  -  1  the  TD-KH  algorithm  degenerates 
into  TDMA  while  for  L/N  -4  oo  it  identifies  with  K&H. 


Assume  an  arbitrary  user  transmitted  in  n  slots  out  of  T 
contiguous  slots.  We  define  the  outage  as  the  probability  of 
the  average  information  transmitted  in  these  n  slots  being 
less  then  a  certain  threshold  aC  where  a  e  [0, 1]  and  C  is  the 
average  sum  rate  capacity.  Formally, 


T  1  n 

Outage  Pr(T)  =  Pr(n  =  0)  +  ^  Fr(n)Fr  |  ^  Cj  <  aC'j 


i= 1 


(1) 


where  Cj  is  information  transmitted  in  each  of  these  n  slots 
by  the  user. 

III.  Outage  asymptotic 

Analytic  calculation  of  (1)  seems  complicated.  However,  we 
notice  that  both  TDMA  and  K&H  are  renewal  process  since 
the  time  intervals  Xi  between  two  subsequent  transmission 
of  the  same  user  are  either  fixed  (TDMA)  or  iid  (K&H).  The 
TD-KH  algorithm  can  be  shown  to  behave  asymptotically  also 
as  a  renewal  process.  In  addition  X2  =  Cj  depends  at  most  on 
the  last  interval  length,  thus  making  the  process  X  =  (Xi ,  X2) 
a  reward  renewal  process.  Using  Cramer’s  theorem  for  R2  we 
reach: 

^lirn^  —  (1/T)  log  (Outage  prob(T))  =  (2) 

o 

where  I{y\,aC)--r — I(yi , aC)  =  0  , 

oy  1 

where  I{y\,y2.)  is  the  rate  function  of  (Xi ,  X2). 

IV.  Results  and  Conclusions 

Inspecting  (2)  we  note  that  the  outage  exponent  of  TDMA 
and  TD-KH  increase  both  as  a  decreases,  where  the  TDMA 
exponent  is  always  above  the  TD-KH  .  For  a  €  [0 ,a6C]  the 
TDMA  and  K&H  exponents  are  larger  then  K&H.  The  K&H 
exponent  is  upper  bounded  by  log(l  —  1/iV)  even  for  a  -4 
0.  The  TD-KH  exponent  on  the  other  hand,  is  not  upper 
bounded  as  in  K&H  even  for  large  L/N  (e.g.  L/N  =  5). 
Thus,  we  conclude  that  under  the  asymptotic  outage  criteria 
the  TD-KH  is  superior  over  K&H. 
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Abstract  —  Relying  on  a  simple  algorithm  for 
the  Laplace  transform  inversion  of  cumulative  dis¬ 
tribution  functions,  we  develop  a  moment  generat¬ 
ing  function-based  numerical  technique  for  the  out¬ 
age  probability  evaluation  of  maximal-ratio  and  equal- 
gain  combining  over  generalized  fading  channels. 


I.  Introduction 

Recently,  a  unified  moment  generating  function  (MGF)-based 
approach  was  adopted  for  the  exact  average  error  rate  analysis 
of  several  modulation  schemes  in  conjunction  with  maximal- 
ratio  combining  (MRC)  and  equal-gain  combining  (EGC)  di¬ 
versity  reception  [1],  In  addition  to  the  average  error  rate, 
outage  probability,  P0ut,  is  another  standard  performance  cri¬ 
terion  of  communication  systems  operating  over  fading  chan¬ 
nels.  It  is  defined  as  the  probability  that  the  combined  signal- 
to- noise  ratio  (SNR),  jt,  falls  below  a  threshold  7th,  i.e., 
Pout  =  P[0  <7 1  <  7th],  =  fjth  Pit  (it)  djt,  where  p7,  (7t) 
is  the  probability  density  function  (PDF)  of  7 1.  Since  find¬ 
ing  the  PDF  of  7t  in  closed  form  is  often  restricted  to  some 
special  cases  while  the  MGF  of  7 1,  =  P7([eS7‘],  can 

be  obtained  in  a  simple  form  for  various  fading  conditions,  we 
present  an  MGF-based  approach  for  the  outage  probability 
evaluation  of  diversity  systems  over  generalized  fading  chan¬ 
nels  in  which  the  diversity  paths  axe  not  necessarily  indepen¬ 
dent,  identically  distributed  nor  even  distributed  according  to 
the  same  family  of  distribution. 


II.  Outage  Probability  Evaluation 

The  total  conditional  SNR  per  symbol,  7t,  at  the  output  of  an 
L-branch  MRC  combiner  or  a  postdetection  EGC  combiner  is 
given  by  7 t  =  7 h  where  7 /  is  the  Ith-path  instantaneous 

SNR  per  symbol.  Applying  the  numerical  technique  developed 
in  [2]  and  after  some  manipulation  we  obtain  Pout  as  [3] 


7th  ' '  <T  '  Pn 

3=0  71  =  0 


x  H 


_ 27th 

A-t-2rr  jn 

27th 


+  E(A,N,Q), 


(1) 


where  the  parameters  A,  N ,  and  Q  can  be  set  to  guarantee 
an  overall  error  given  by 


\E(A,N,Q)\ 


2 ~Q  e-4/2 


7th 


E<-»"+,+’0 


9—0 


7^ 


■M-rt  (- 


A+27y(N+q+l)  1 
27th  > 


A+2xj(N+q  +  \) 
27th 


(2) 
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Figure  1:  Outage  probability  with  MRC  or  postdetection  EGC 
(L  =  4)  versus  normalized  average  SNR  of  the  first  path  Tj/yth 
over  an  exponentially  decaying  PDP  and  an  exponential  correlation 
profile  across  the  mutipaths  ((a)  p  —  0,  (b)  p  —  0.2,  and  (c)  p  —  0.4). 


For  coherent  EGC,  the  conditional  combined  SNR  per  sym¬ 
bol,  7 i,  is  given  by  =  j;  (Ef=i  Vv)  •  The  outage  proba¬ 
bility  Pout  can  hence  be  rewritten  as  P0ut  =  P[0  <7 1  <  7th], 

where  7 1  =  y/7i  and  7th  =  \Z^7th-  Since  the  MGF  of 
,/yi  can  be  found  in  closed-form  for  the  Nakagami-m  case, 
the  outage  probability  of  coherent  EGC  receivers  can  also  be 
computed  using  (1),  and  the  corresponding  numerical  error 
can  be  estimated  from  (2),  where  in  these  two  expressions  ~ft 
and  7th  are  replaced  by  jt  and  7th,  respectively. 

III.  Numerical  Example 

As  an  illustration  of  the  applicability  of  the  approach  to  cases 
where  a  “classical”  PDF-based  approach  fails  to  give  an  easy- 
to-compute  solution,  Fig.  1  shows  the  outage  probability  of 
MRC  RAKE  reception  over  a  Nakagami  chanel  with  an  expo¬ 
nentially  decaying  power  delay  profile  (PDP)  (7,  = 
where  5  is  the  power  decay  factor)  and  an  exponential  corre¬ 
lation  profile  (such  as  pn<  =  !)  across  the  multipaths. 
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Abstract  —  We  derive  the  optimal  power  allocation 
and  minimum  outage  probability  for  fading  multiple- 
access  channels  under  the  assumption  that  both  the 
transmitters  and  the  receiver  have  perfect  channel 
side  information.  This  minimum  outage  probability 
implicitly  defines  the  outage  capacity  region.  Two  dif¬ 
ferent  assumptions  about  whether  the  outage  decla¬ 
ration  from  each  user  is  simultaneous  or  independent 
are  considered. 

Wireless  communication  channels  vary  over  time  due  to 
user  mobility.  Assuming  that  the  channel  side  information 
(CSI)  is  available  at  both  the  transmitter(s)  and  the  re¬ 
ceivers)  ,  the  zero-outage  capacity  regions  are  derived  for  fad¬ 
ing  multiple-access  channels  (MAC)  and  for  fading  broadcast 
channels  in  [1]  and  [2],  respectively.  This  type  of  capacity 
is  the  maximum  constant  rate  that  can  be  maintained  in  all 
fading  conditions  through  optimal  power  control.  By  allow¬ 
ing  some  transmission  outage  under  severe  fading  conditions, 
the  maximum  rate  that  can  be  kept  constant  during  non¬ 
outage  will  increase.  Finding  the  optimal  power  allocation 
that  achieves  the  outage  capacity  for  a  given  outage  proba¬ 
bility  is  tantamount  to  deriving  the  allocation  strategy  that 
minimizes  the  outage  probability  for  a  given  rate  or  rate  vec¬ 
tor.  This  minimum  outage  probability  problem  is  solved  for 
a  single-user  fading  channel  in  [3]  and  for  a  fading  broadcast 
channel  in  [2]. 

In  this  paper  we  consider  the  optimal  power  allocation  and 
minimum  outage  probability  problem  for  an  M- user  fading 
MAC  under  different  assumptions  about  whether  the  outage 
declaration  from  each  user  is  simultaneous  or  independent.  A 
discrete-time  M-user  fading  MAC  model  as  discussed  in  [1]  is 
characterized  by  the  output 

Y(n)  -  E“i  \/Hi(n)Mn)  +  Z(n), 


where  Xi  (n)  and  Ht(n)  are  the  transmitted  waveform  and  the 
fading  process  of  the  ith  user,  respectively,  and  Z(ri)  is  the 
Gaussian  noise  with  variance  a2 .  For  a  slowly  time- varying 
MAC,  let  h  =  (hi ,  hi,  ■  ■  • ,  /ijw)  be  the  joint  fading  state  at  a 
particular  time  n,  i.e.,  H(n)  =  h. 

In  the  zero-outage  case,  given  an  average  power  con¬ 
straint  vector  P*  =  (P/,  P2*,  ■  •  • ,  Pm)  and  a  rate  vector  R  = 
(Ri,R.2,  ■  ■■ ,  Rm )  for  the  M  users,  an  iterative  algorithm  (we 
will  refer  to  it  as  the  Hanly-Tse  (HT)  Algorithm)  is  proposed 
in  [1]  for  obtaining  the  optimal  power  allocation  strategy  that 
solves 


inf  max 

■p  l<t <M 


fi(R) 
P T  ’ 


(i) 


where  V  denotes  a  power  allocation  policy  and  P,(R)  is  the 
resulting  average  transmit  power  of  each  user  i  required  to 
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support  R  in  every  fading  state  without  any  outage.  There¬ 
fore,  rate  vector  R  lies  in  the  zero-outage  capacity  region  if 
and  only  if  the  infimum  in  (1)  is  no  greater  than  1. 

Now  if  the  infimum  in  (1)  is  larger  than  1,  the  given  rate 
vector  R  can  only  be  maintained  with  a  non-zero  outage  prob¬ 
ability  for  some  or  all  of  the  M  users.  In  this  case,  under  the 
assumption  that  the  transmission  from  all  users  is  turned  on 
or  off  simultaneously,  we  wish  to  obtain  the  minimum  common 
outage  probability  Pr*  =  Prmi„ (P*,R)  and  the  correspond¬ 
ing  optimal  power  allocation.  Under  the  alternative  assump¬ 
tion  that  the  transmission  from  each  user  is  turned  on  or  off 
independently,  we  wish  to  obtain  the  outage  probability  region 
e>i(P*,R)  and  the  optimal  power  allocation  that  achieves  the 
boundary  surface  of  0j(P*,R),  where  Oi(P* ,  R)  is  the  set 
of  all  average  outage  probability  vectors  for  which  R  can  be 
maintained  with  the  average  transmit  power  of  each  user  i  no 
larger  than  P/,  VI  <  i  <  M. 

Under  the  first  assumption,  given  the  rate  vector  R  and 
power  constraint  vector  P*  fixed,  for  each  common  outage 
probability  Pr  >  0,  we  use  a  similar  algorithm  as  the  HT 
Algorithm  to  find  the  power  allocation  that  solves 


inf  max 
V  1  <i<M 


Pi(Pr,  R) 

Pi T 


(2) 


where  P  denotes  a  power  allocation  policy  for  which  the  com¬ 
mon  outage  probability  is  Pr  and  the  resulting  average  trans¬ 
mit  power  of  each  user  i  is  Pi(Pr,  R).  By  denoting  the  infi¬ 
mum  in  (2)  as  Inf(Pr),  it  can  be  shown  that  Inf(Pr)  is  a 
strictly  decreasing  function  of  Pr  [4],  Therefore,  it  is  clear 
that  Inf(Pr)  >  1  if  Pr  >  Pr *  and  Inf(Pr)  <  1  otherwise, 
with  equality  achieved  when  Pr  =  Pr*.  We  propose  an  iter¬ 
ative  algorithm  that  converges  to  the  power  allocation  satis¬ 
fying  Inf(Pr*)  =  1,  and  finds  the  minimum  common  outage 
probability  Pr*. 

Under  the  alternative  assumption  that  an  outage  can  be 
declared  independently  for  each  user,  a  similar  iterative  algo¬ 
rithm  is  proposed  to  obtain  the  boundary  surface  of  the  outage 
probability  region  C>/(P*,R). 
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Abstract  —  Peak  transmitted  power  is  a  key  issue 
in  wireless  systems.  In  this  paper  we  consider  upper 
and  lower  bounds  to  Gallager’s  random  coding  error 
exponent  [1]  for  the  two  dimensional  (or  quadrature) 
memoryless  flat  fading  channel  with  perfect  channel 
state  information  (CSI)  at  the  receiver  and  when  a 
peak  power  constraint  is  imposed  at  the  transmitter. 

I.  Summary 

Due  to  space  limitations,  we  only  provide  the  final  results. 
For  more  detials,  the  reader  is  referred  to  [2]. 

Let  us  assume  a  discrete-time  memoryless  fading  chan¬ 
nel  (with  AWGN)  for  which  the  received  symbol  Y  is  equal 
to  y  =  i)x  +  w,  where  A  is  a  transmitted  symbol,  T  is  a 
known  fading  variable  and  W  is  the  AWGN  term.  For  an 
input  per-letter  peak  power  constraint  of  the  form  \x\2  < 
V ,  an  upper  bound  to  the  ensemble  block  decoding  er¬ 
ror  probability,  for  any  choice  of  p;  0  <  p  <  1,  is  eas¬ 
ily  determined  to  be  [2] [1]  Pe  <  exp {—NET(q{x),p,R)}, 
where  the  error  exponent,  Er  (p,p(x) ,  R),  is  defined  as 
Er  ( q  (x) ,  p,  R)  =  E0  ( q  ( x )  ,p)  -  pR  ,  where  Ea  ( q  ( x ) ,  p)  = 

-Inj/^tu)  Jy  [fx  q(x)  p(y\x,v)1/l-1+p)  dx]1+Pdydv j.  The 

input  distribution  q(x)  follows  the  general  form  q(x)  = 
g(x)  u  (V  —  |x|2),  where  u(-)  is  the  unit-step  function  and 
g(x)  satisfies  /x.  \xp<-p  9  (x)  =  The  random  coding  ex¬ 

ponent  Er  (A)  is  achieved  by  maximizing  Er  (q  (x) ,  p,  R)  over 
all  q{x)  and  p.  Finally,  and  without  loss  of  generality,  let 
=  0.5,  in  order  to  obtain  a  unity  power  fading.  Also,  let 
<Tw  =  1,  then  the  peak-power-to-noise-ratio  (PPNR)  is  de¬ 
fined  as  PPNR  =  4^  =  V/2. 

aw 

Instead  of  optimizing  over  the  input  distribution,  which  is 
a  difficult  task,  we  propose  upper-  and  lower-bounds  to  the 
exponent  so  as  to  trap  this  function  to  a  reasonable  degree  of 
accuracy. 

An  upper  bound  to  the  error-exponent  Er  (A),  can  be 
shown  to  be  [2]  Er  (A)  <  max0<p<i  {E0,u  ( p )  -  pA},  where 

E0,u(p)  =  p  -  ln(l  +  p)  -  ln£p(„)  +  j- 

should  be  mentioned  that  the  aforementioned  upper  bound 
not  only  is  an  upper  bound  to  the  error  exponent  of  the  per- 
letter  peak-power  limited  channel,  but  also  is  an  upper  bound 
to  the  random  coding  exponent  of  the  per-letter  average-power 
limited  channel  (see  [2]  for  clarification). 

A  lower  bound  to  the  random  coding  exponent  for  the 
peak-power-limited  ideal  fading  channel  can  be  determined  us¬ 
ing  any  input  distribution  that  satisfies  the  peak-power  con- 

1  This  research  has  been  performed  at  the  ECE  Dept.  Queen’s 
Univ.,  Canada.  It  has  been  partially  supported  by  the  Telecom¬ 
munications  Research  Institute  of  Ontario  (TRIO)  and  the  Natural 
Sciences  and  Engineering  Council  of  Canada  NSERC. 


Peter  J.  McLane 
ECE  Dept.,  Queen’s  University 
Kingston,  Ontario 
Canada  K7L  3N6 
mclanepfflqucdnee . ee . queensu . ca 

srtaint.  We  have  attempted  three  different  input  distribu¬ 
tions.  The  first  is  a  rectangular-uniform  input  distribution 

which  has  the  pdf  q  (x)  =  (^)  U  'u  )-  where 

U(-)  is  defined  as  U(f)  =  1  ]  otherwise  '  secon<^  a*;' 

tempt  is  a  circular-uniform  distribution  which  has  the  pdf 
f  J_  •  M2  <  -p 

q  ( x )  =  <  T7*  ’  }  ~  .  .  The  last  attempt  is  a  conical 

'  '  10  ;  otherwise 

distribution  with  pdf  q  (x)  =  u  (V  —  |x|2) , 

where  u  (•)  denotes  the  unit-step  function.  It  has  been  possi¬ 
ble  [2]  to  derive  a  closed-form  expression  for  the  lower  bound 
based  on  the  conical  input  pdf.  For  the  rectangular-uniform 
pdf,  a  Monte-Carlo-integration  based  methodology  has  been 
used  to  numerically  calculate  the  bound.  Finally,  for  the 
circular-uniform  pdf,  only  asymptotic  results  at  high  peak- 
power-to-noise  ratio  (PPNR),  as  well  as  cut-off  rate  numerical 
calculations  at  any  PPNR,  have  been  feasible.  On  the  other 
hand,  the  lower  bound  based  on  the  concial  pdf  is  the  loosest, 
the  bound  based  on  the  rectangular-uniform  pdf  is  tighter 
than  the  one  based  on  the  concial  pdf.  Finally,  the  bound 
based  on  the  circular-uniform  pdf  is  the  tightest. 

II.  Results 

Upon  evaluation  of  the  upper  and  lower  bounds  proposed 
in  this  paper,  it  has  been  found  that  at  high  PPNR,  the  dif¬ 
ferences  between  the  upper  bound  to  the  cut-off1  rate  and  the 
corresponding  lower  bounds,  due  to  the  rectangular-  and  the 
circular-uniform  inputs,  are  1.45  and  1.0  nats/symbol,  respec¬ 
tively.  Also,  the  differences  between  the  upper  bound  to  the 
cut-off  rate  and  the  corresponding  lower  bound  based  on  the 
conical  pdf  is  1.61  nats/symbol.  It  should  be  mentioned  that 
the  aforementioned  asymptotic  (high  PPNR)  differences  are, 
in  fact,  independent  of  the  fading  distribution. 

In  conclusion,  since  the  upper  bound  we  propose  is  also  an 
upper  bound  to  the  random  coding  exponent  for  the  channel 
with  average-power-constrained  inputs,  it  follows  that  the  loss 
in  the  error  exponent,  due  to  the  peak-power  constraint  and 
relative  to  the  average-power-constrained  channel,  is  no  more 
than  1.0  nats/symbol,  which  is  equal  to  1.44  bits/symbol. 
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Abstract  —  The  hard-square  model  consists  of  all 
binary  arrays  in  which  the  l’s  are  isolated  both  hori¬ 
zontally  and  vertically.  Based  on  a  certain  probability 
measure  defined  on  those  arrays,  an  efficient  variable- 
to-fixed-rate  encoding  scheme  is  obtained  that  maps 
unconstrained  binary  words  into  arrays  that  satisfy 
the  hard-square  model.  For  sufficiently  large  arrays, 
the  average  rate  of  the  encoder  approaches  a  value 
which  is  only  0.1%  below  the  capacity  of  the  con¬ 
straint.  A  second,  fixed-rate  encoder  is  obtained 
whose  rate  for  large  arrays  is  within  1.2%  of  the  ca¬ 
pacity  value. 

I.  Introduction 

Recent  developments  in  optical  storage  are  attempting  to 
increase  the  recording  density  by  exploiting  the  fact  that  the 
recording  device  is  two-dimensional  in  nature.  This,  in  turn, 
motivates  the  study  of  coding  schemes  for  two-dimensional 
constraints  that  may  be  present  in  those  devices. 

The  hard-square  model,  defined  next,  is  a  notable  example 
of  such  a  constraint.  Consider  (without  real  loss  of  generality) 
the  parallelograms 

Am,n  =  €  z2  :  0  <  i  <  m,  0  <  i  +  j  <  n} 

and  mappings  x  :  Am,n  — ►  (0, 1},  where  hereafter  Xij  denotes 
the  value  of  x  at  location  (i,j)  G  Am,n .  We  say  that  such  a 
mapping  x  satisfies  the  hard-square  model  if  Xij  =  1  implies 
Xi,j+i  =  0  (when  j  <  n—  1)  and  Xi+ij  —  0  (when  i  <  m— 1). 
The  set  of  all  mappings  over  A m,„  that  satisfy  the  hard-square 
model  will  be  denoted  by  <S(Amj„). 

The  main  goal  of  this  work  is  designing  efficient  lossless 
coding  schemes  of  unconstrained  binary  words  into  elements 
of  5(Am,„). 

II.  VARIABLE-TO-FIXED-RATE  ENCODER 

Based  on  the  idea  of  two-dimensional  bit-stuffing  intro¬ 
duced  in  [3],  we  obtain  a  variable-to-fixed-rate  encoder  into 
S(Am,n)-  Our  encoder  effectively  realizes  the  following  prob¬ 
ability  measure  fim,n  on  <S(Am,n):  for  every  x  G  <S(Am,„), 

flm,n{x'f  —  Jin  ^  (xo,0,  Xo,l ,  ■  ■  -  ,  Xo,n-l  ) 

Pm  (Xl ,  —  1 ,  X2,  — 2,  -  •  •  ,  Xm_l,_(m_l)) 

m—1  n—  1  —  t 

n  n 

i=l  j'=  — i+1 

^his  work  was  supported  in  part  by  Grants  Nos.  95-00522  and 
98-00199  from  the  United-States-Israel  Binational  Science  Founda¬ 
tion  (BSF),  Jerusalem,  Israel,  by  Grant  No.  NCR-9612802  of  the 
National  Science  Foundation  (NSF),  and  by  the  Center  for  Magnetic 
Recording  Research  at  the  University  of  California,  San  Diego. 


where,  for  two  parameters  qo  €  [0, 1)  and  qi  &  (0, 1], 
tf(0|W)  =  l-tf(l|W)  =  {  “  ' 

The  boundary  measures,  fi^  and  fim  \  are  set  so  that  the  non¬ 
boundary  values  have  a  stationary  distribution.  The  limit 

H=  lim - —  y  flm,n  (x )  log2  flm,n  (x) 

m,n— >oo  77171  *  ^ 

x€«S(Am,n) 

exists  and  can  be  written  explicitly  as  a  function  of  qo  and  q\ . 
Maximizing  this  function  yields  H  «  0.587277,  which  is  the 
average  rate  of  our  encoder.  This  rate  is  only  0.1%  below  the 
capacity  value  of  the  hard-square  model  [1],  [2],  [4], 

III.  Fixed-rate  encoding  scheme 

With  a  slight  compromise  on  the  rate,  we  can  also  obtain 
an  efficient  fixed-rate  encoder  into  <S(Am>rt).  Let  S(iT  be  the 
set  of  all  words  in  {0, 1}*  of  Hamming  weight  r  in  which  the 
l’s  are  isolated,  and  for  a  prescribed  positive  integer  t  define 

t-i  , 

K(n,t)  =  y^  2s  ■  (  &  J  ■  |<5n-3t+2,t-4|  ■ 

s=0  '  ' 

The  images  of  our  encoder  are  elements  x  G  <S(Am,„)  that  sat¬ 
isfy  the  weight  constraint  JT  Xij  —  t  for  each  i.  The  coding 
rate  is  [(\og2  K(n,  t))/n\,  and  the  weight  constraint  allows  to 
obtain  efficient  encoding  through  enumerative  coding. 

It  can  be  shown  that  for  every  fixed  rational  <5, 

limsup  (1/n)  •  log2  K(n,6n)  >  sup F(5,p)  , 

71— »  OO  p 

where  p  ranges  over  [0,min{J/(l— 3J),  1/2}]  and 

F(6,p)=611+h((l/S-3)p)}+(l-36)i(l-Pyh(p/(l-p))-p\  , 

with  h(t)  standing  for  — t-(log2t)  —  (1— f)- log2(l— t).  Maxi¬ 
mizing  over  5,  we  thus  obtain  the  coding  rate 

ma xF(5,p)  «  0.581074  , 

(<$,  p) 

which  is  within  1.2%  of  the  capacity  value  of  the  hard-square 
model. 
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Abstract  —  A  two-dimensional  code  for  a  second 
order  spectral  null  constraint  is  given  find  it  is  shown 
that  the  rate  of  the  code  is  asymptotically  equal  to 
1. 


I.  Introduction 

Recently  two-dimensional  recording  devices  are  developed. 
Therefore  several  authors  have  studied  two-dimensional  codes 
for  these  devices  [1],  [2]. 

It  has  been  shown  that  there  are  encoding  algorithms  for 
two-dimensional  spectral  null  constraints  with  asymptotical 
code  rate  1  [3].  In  this  paper  we  introduce  a  coding  rule 
for  two-dimensional  second  order  spectral  null  constraints  and 
show  that  its  code  rate  is  asymptotically  1.  We  describe  an 
outline  of  our  coding  rule. 

II.  Preliminaries 

Let  a  =  aodi  •  ■  •  at-i  be  a  finite  sequence  of  numbers.  The 
running  digital  sum  of  o,  denoted  by  RDS(a),  and  the  sec¬ 
ond  order  running  digital  sum  of  a,  denoted  by  RDS^(a), 
are  defined  as  follows:  RDS(a)  =  ^Cf=oX  aJ  ar>d  RDSf2)  (a)  = 
^2f~o  RDS(o0ai  •  -  Oj).  If  RDS(a)  =  0  then  we  say  that 
a  satisfies  a  spectral  null  constraint  at  dc.  If  RDS(a)  = 
RDS(2)(a)  =  0  then  we  say  that  a  satisfies  a  second  order 
spectral  null  constraint  at  dc.  For  a  symbol  a  we  define  a  by 
a  =  — a  and  a  by  a  =  ao  aT •  •  ■  ar_i.  We  introduce  an  equiv¬ 
alence  relation  *=’  of  sequences  a  and  b  such  that  a  =  6  if 
a  =  b  or  a  =  b. 

Let  A  be  a  two-dimensional  array.  If  all  rows  and  all 
columns  of  A  satisfy  a  second  order  spectral  null  constraint 
at  dc  then  we  say  that  A  satisfies  the  constraint  horizontally 
find  vertically,  respectively.  If  an  array  satisfies  a  second  order 
spectral  null  constraint  both  horizontally  and  vertically  then 
we  say  that  the  array  satisfies  a  two-dimensional  second  order 
spectral  null  constraint. 

III.  Outline  of  Coding  Rule 

We  assume  that  the  channel  symbol  alphabet  is  {—1,1}. 
Let  Ao  be  a  two-dimensional  array  of  size  m  x  n. 

Step  1  First  we  encode  Ao  into  a  two-dimensional  array 
A  which  satisfies  a  second  order  spectral  null  constraint  at 
dc  horizontally  by  using  the  Henry-Knuth  method  [4]  and  a 
method  by  Tallini  et.  al.  [5].  Then  the  length  of  each  row  of 
A  is  «i  =  n  +  4  flog  7t"|  . 

Step  2  Let  p{  be  the  i-th  row  of  A  and  let 
{quq2,...  ,qN}  be  the  set  of  all  distinct  code  words  appear¬ 
ing  as  rows  in  A  where  we  identify  p  with  p'  if  p  =  p' .  Without 
loss  of  generality  we  assume  that  the  first  symbol  of  q,  is  1  for 
*  =  1,2,...  ,  N.  We  define  L(i)  to  be  the  number  of  elements 
in  {j  :  pj  =  q{}.  We  define  C(i,j),  i  =  1,  2, . . .  ,  IV  so  that 


=  Qi  for  i  =  1,2,-..  ,  L(i)  and  £(i,  j)  <  l{i,j  +  1)  for 

j  =  1,2,...  > L{%)  —  1. 

Step  3  We  extend  A  vertically  (L(i)  and  l(i,  j)  are  also 
extended  at  the  same  time)  by  duplicating  rows  so  that  L(i)  is 
a  multiple  of  4.  Let  J  be  the  number  of  rows  of  the  resulting 
array. 

Step  4,  Step  5  and  Step  6  We  invert  rows  so  that 
the  absolute  values  of  the  second  order  running  digital  sums 
of  columns  can  be  bounded  from  above  by  2ni  J. 

Step  7,  Step  8  and  Step  9  We  add  rows  so  that  the 
second  order  running  digital  sum  of  each  column  of  the  result¬ 
ing  array  is  1,  -1  or  0.  This  can  be  accomplished  by  adding  at 
most  2”1  |"2\/jj  rows.  We  also  add  rows  to  the  resulting  array 
in  order  that  all  columns  satisfy  the  second  order  spectral  null 
constraint.  The  number  of  extra  rows  is  constant. 

Step  10  Let  6(i),  i  =  1, . . .  ,  m  be  a  sequence  such  that 
b(i)  =  1  if  the  t-th  row  in  Ao  is  inverted  and  6(t)  —  —1  oth¬ 
erwise.  We  concatenate  the  resulting  array  we  get  above  and 
6(1),  6(2), . . .  ,6(m)  so  that  they  satisfy  the  two  dimensional 
second  order  spectral  null  constraint. 

IV.  Code  Rate 

The  number  of  columns  of  A  is  ni .  In  step  4  we  add  at  most 
2"1  rows  to  A.  In  step  8  we  add  at  most  W2ni  rows  where 
W  is  the  smallest  multiple  of  4  with  4(m  +  2ni)2"'  <  W2. 
In  step  9  we  add  at  most  4  •  2n‘  rows.  In  step  10  we  add  at 
most  4  |"  ^  J  rows.  Therefore  the  code  rate  of  our  algorithm 
is  bounded  from  below  by 

mn 

m  (m  +  2"1  +  f  v/4(m  +  2ni)2nij  +  4  +  4  ■  2"1  +  4 

We  consider  a  rectangular  array  of  size  22n  x  n.  Then  the 
above  rate  tends  to  1  as  n  — >•  oo. 
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Abstract  —  This  paper  proposes  a  method  of  joint 
decoding  for  combined  system  between  2-Dimensional 
Partial  Response  (2-D  PR)  system  and  convolutional 
codes  for  the  purpose  of  high  density  magnetic  record¬ 
ing.  Since  both  2-D  PR  system  and  convolutional 
code  can  be  represented  with  trellis  diagrams,  de¬ 
coding  performance  can  be  improved  by  joint  decod¬ 
ing  with  their  overall  trellis  diagram.  Moreover,  a 
method  of  decoding  which  can  achieve  low  amount  of 
calculation  is  also  investigated. 

I.  Channel  Model 

If  ITI  is  known  in  advance,  2-D  PR  system  (3  track  -  3 
head)  can  be  shown  in  figure  1.  “ a ”  is  the  amount  of  ITI. 


Figure  1:  2-D  PR4  (3  Tracks  -  3  Heads) 


From  figure  1,  we  assume  that  ITI  from  the  outer  track 
existing  on  the  center  track  is  larger  than  the  ITI  from  the 
center  track  on  the  outer  track.  By  computer  simulation,  we 
confirmed  worse  BER  performance  of  center  track  than  BER 
performance  of  outer  tracks  (figure  2). 


Figure  2:  BER  performance 


II.  Convolutional  Code  and  2-D  PR4 

Since  BER  of  the  center  track  is  worse  than  that  of  the 
outer  tracks,  it  is  useful  to  use  error  correcting  codes  for  2-D 
PRML  in  which  the  data  in  the  center  track  are  protected 
against  a  greater  number  of  errors  than  the  data  in  the  outer 
tracks.  As  we  described,  both  2-D  PR  system  and  convolu¬ 
tional  code  can  be  represented  with  trellis  diagrams.  More¬ 
over,  decoding  performance  can  be  improved  by  joint  decoding 
with  their  overall  trellis  diagram. Therefore,  we  consider  joint 
decoding  system  with  convolutional  codes  and  2-D  PR4  sys¬ 
tem.  The  system  model  is  shown  in  figure  4. 

We  consider  that  the  contents  of  memory  elements  are  a„ti 
(memory  elements  in  convolutional  codes)  and  bn>i  (memory 
elements  in  PR4  system),  where  n  is  the  time  instant  and  l 
is  the  track  number.  In  this  system,  outputs  are  affected  by 
3  bits  and  2  time  instances,  so  the  state  of  the  system,  Sp,  is 
defined  as  a  3xn  matrix  as 


Sp  — 


Urc-1,2 

On-1,3 


On— 2,1 
On — 2,2 
On— 2,3 


bn- 1,1 
bn-1,2 
bn- 1,3 


bn- 2,1 
bn- 2,2 

bn- 2,3 


(1) 


Figure  3:  System  model 


III.  Algorithm  for  Low  Amount  of  Calculation 

Next,  the  algorithm  for  low  amount  of  calculation  of  joint 
decoder  is  described.  As  an  example,  we  consider  the  simplest 
case  applies  a  (7,5)g  convolutional  code  to  the  only  center 
track.  By  using  buffer,  the  number  of  input  bits  of  each  tracks 
is  same,  even  the  code  rate  of  the  center  track  and  that  of  the 
outer  track  is  different.  Therefore,  6  bits  are  necessary  for 
1  path  in  the  trellis  diagram.  Since  the  number  of  memory 
elements  of  both  PR4  system  and  (7,5)8  convolutional  code 
is  the  same,  the  number  of  states  in  the  Viterbi  decoder  will 
decrease. 
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Abstract  —  A  comparison  of  various  full-surface 
demodulation  algorithms  is  presented.  Algorithms 
based  on  an  iterative  approach  achieve  highest  data 
storage  density  at  reasonable  complexity  and  fixed  bit 
error  rate. 

I.  Introduction 

Consider  information  transmission  using  a  full-surface 
paradigm.  The  user  data,  u,  is  represented  as  a  one¬ 
dimensional  signal  over  a  two-dimensional  index  set,  i.e.  a  ma¬ 
trix  of  data  values,  mj  e  {0, 1}.  The  noise-free  output  of  the 
channel,  denoted  v,  may  be  modeled  as  the  two-dimensional 
convolution  of  the  user  data,  u,  and  the  channel  model,  c, 
written  v  =  c  *  u.  Additive  white  gaussian  noise,  n,  corrupts 
the  noise-free  output,  v,  which  results  in  the  received  signal, 
r  =  v-fn.  The  full-surface  maximum-likelihood  (ml)  demodu¬ 
lator  minimizes  the  expression  | Jr  —  v||2  =  JU  —  t)*,y )2, 

where  v  denotes  the  noise-free  channel  output  due  to  channel 
input  u,  i.e.  v  =  c  *  u,  and  ||  *  ||  denotes  the  L2  norm. 

In  the  full-surface  optical  data  storage  problem,  the  chan¬ 
nel  may  be  modeled  as  the  truncation  of  a  bivariate  gaussian 
blur,  i.e.  aj  —  exp  -  exp  ,  where  5X  and 

Sy  denote  separation  of  bits  in  the  vertical  and  horizontal  di¬ 
rections  and  (7C  is  the  physical  variance  of  the  channel  point- 
spread  function.  The  unitless  storage  density,  D ,  of  such  a 
system  is  given  by  D  =  ,  where  dividing  D  by  <rl  produces 

a  physical  density  measurement.  Finally,  the  signal-to-noise 
ratio  (snr)  is  given  by  SNR  =  101og10  where  cr„  is  the 
variance  of  the  noise. 

II.  Comparison 

Figure  1  gives  a  performance  comparison  of  several  full- 
surface  demodulators.  For  a  given  point  on  the  graph,  data 
is  stored  at  density  given  by  the  vertical  axis  with  SNR  given 
by  the  horizontal  axis  and  demodulator  output  that  achieves 
a  bit  error  rate  of  10~2. 

To  date,  no  ml  full-surface  demodulator  with  reasonable 
complexity  has  been  found.  The  curve  of  asterisks  is  an  upper 
bound  to  the  ML  demodulator  and  is  computed  using  a  brute- 
force  ML  search  over  data  blocks  of  size  four  by  four. 

The  other  algorithms  perform  at  sub-ML  levels.  The  solid 
curve  shows  the  performance  of  the  multitrack  Viterbi  algo¬ 
rithm  (mva)  due  to  Krishnamoorthi  [1].  The  curve  with  “x” 
is  MVA  applied  to  received  data  that  has  been  filtered  using 
a  full-surface  equalizer.  This  equalizer  allows  for  density  im¬ 
provement  only  at  high  SNR  when  colored  noise  has  a  negligible 
effect  on  the  ML  demodulation  criterion. 

The  combination  of  ML  image  processing  methods  with 
threshold  decision  yields  the  dashed  curve.  This  is  improved 

1This  work  was  supported  by  a  JSEP  Fellowship 


SNR  [dB] 


Figure  1:  Comparison  of  Full-Surface  Demodulators 

to  the  curve  with  circles  by  applying  iterative  thresholding, 
an  idea  developed  by  Kau  [2], 

The  next  class  of  algorithms  is  based  upon  a  greedy  algo¬ 
rithm.  The  dotted  curve  in  Figure  1  shows  the  performance 
of  a  greedy  algorithm  that  iteratively  chooses  data  bits  to 
minimize  a  local  metric.  It  uses  a  single-point  optimization, 
changing  bits  one  at  a  time.  This  algorithm  is  improved  signif¬ 
icantly  by  using  techniques  of  simulated  annealing  that  avoid 
local  minima  (the  dot-dash  curve).  Expanding  the  greedy  al¬ 
gorithm  to  perform  multipoint  optimizations  of  likely  error 
sequences  produces  the  curve  of  plus  signs.  Finally,  a  ge¬ 
netic  algorithm  based  upon  the  improved  greedy  demodulator 
achieves  the  best  results.  The  curve  of  triangles  uses  an  initial 
population  of  five  estimated  user  data  matrices  and  six  gen¬ 
erations  of  combining  three  individuals  by  majority  decision. 
The  curve  of  upside-down  triangles  improves  on  this  by  using 
a  larger  initial  population  of  six  and  ten  generations  of  natural 
selection  based  upon  the  ML  metric. 
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The  most  common  approach  for  dealing  with  one-dimen¬ 
sional  error  bursts  is  interleaving.  For  example,  to  implement 
the  correction  of  bursts  of  length  4,  one  can  use  four  different 
codewords  drawn  from  a  code  that  corrects  r  errors,  while  en¬ 
coding,  or  interleaving,  the  one-dimensional  data  sequence  as 
follows:  123412341234  •  •  • .  An  alternative  way  to  correct  any 
t  bursts  of  length  up  to  4  is  to  use  two  different  codewords 
from  a  code  that  corrects  2r  errors,  while  interleaving  the  one¬ 
dimensional  data  sequence  as  follows:  112211221122  This 
is  an  interleaving  scheme  with  two  repetitions,  in  that  the  same 
integer  appears  (at  most)  twice  within  a  burst  of  length  4. 

While  the  optimal  one-dimensional  interleaving  schemes, 
both  with  and  without  repetitions,  are  straightforward,  in 
two  dimensions,  it  is  not  at  all  obvious  how  to  interleave  a 
minimal  number  of  codewords  so  that  any  burst  of  size  up 
to  t  can  be  corrected.  Most  two-dimensional  burst-correcting 
codes  that  have  been  studied  in  the  literature  so  far  correct 
error  bursts  of  a  given  rectangular  shape,  say  ti  x  £2  rectan¬ 
gular  arrays.  In  this  work,  we  assume  that  a  cluster  of  errors 
can  have  an  arbitrary  shape,  as  long  as  it  maintains  horizon¬ 
tal/vertical  connectivity.  Important  applications  where  the 
correction  of  such  two-dimensional  error  clusters  is  required 
are  optical  recording  and  holographic  storage  [2]. 

Given  the  foregoing  notion  of  a  cluster,  one  may  define 
a  two-dimensional  interleaving  scheme  A(t,r)  of  strength  t  with 
r  repetitions  as  an  infinite  array  of  integers  characterized  by 
the  property  that  every  integer  appears  at  most  r  times  in 
any  cluster  of  size  t.  The  interleaving  degree  of  A(t,r),  denoted 
deg  A(t,  r),  is  the  total  number  of  distinct  integers  contained  in 
the  array.  An  interleaving  scheme  A(t,  r)  is  said  to  be  optimal 
if  deg  A(t,  r)  is  the  minimum  possible  for  the  given  f  and  r. 

Blaum,  Bruck,  and  Vardy  [2]  constructed  optimal  two- 
dimensional  interleaving  schemes  without  repetitions  for  all  t. 
Blaum,  Bruck,  and  Farrell  [1]  generalized  the  two-dimensional 
interleaving  schemes  of  [2]  in  such  a  way  that  each  integer 
appears  at  most  twice  in  any  cluster  of  size  t.  However, 
the  methods  developed  in  [1]  axe  limited  in  their  scope  and 
applicability.  On  the  other  hand,  it  is  obvious  from  the  work 
of  [1,  2]  that  the  problem  of  constructing  A(t,r)  to  minimize 
deg  A(t,r)  becomes  much  more  challenging  for  r  ^  2. 

In  this  work,  we  introduce  the  notion  of  r-dispersion  that 
turns  out  to  be  crucial  in  the  design  of  two-dimensional  in¬ 
terleaving  schemes  with  repetitions.  The  r-dispersion  may  be 
thought  of  as  a  generalization  of  the  L\ -distance  to  a  quantity 
that  reflects  a  property  of  r  points  for  r  ^  2.  For  r  =  3,  4, 
we  refer  to  the  corresponding  r-dispersion  as  tristance  and 
quadristance;  efficient  methods  for  computing  these  disper- 
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sions  are  presented.  We  also  introduce  a  special  class  of  inter¬ 
leaving  schemes  based  on  two-dimensional  lattices,  which  we 
call  lattice  interleavers.  Lattice  interleavers  are  akin  to  linear 
codes  in  coding  theory:  both  classes  are  distinguished  by  the 
fact  that  a  certain  linearity  property  is  imposed  on  their  struc¬ 
ture.  So  far,  all  the  best-known  interleaving  schemes,  with  or 
without  repetitions,  belong  to  the  class  of  lattice  interleavers. 

We  construct  lattice  interleavers  A(t,  2)  for  all  i,  and  com¬ 
pute  the  corresponding  tristance.  We  also  derive  lower  bounds 
which  show  that  our  constructions  are  optimal  for  even  t.  Fi¬ 
nally,  we  develop  the  methodology  for  an  elaborate  computer 
search  that  produces  optimal  lattice  interleavers  with  two  rep¬ 
etitions  for  all  t  <  161.  These  results  support  our  conjecture 
that  the  lattice  interleavers  A{t,  2)  constructed  in  this  work 
are,  in  fact,  optimal  for  all  values  of  t,  both  even  and  odd. 

We  present  analogous  constructions,  bounds,  and  computer 
search  for  lattice  interleavers  with  three  repetitions,  and  prove 
that  our  constructions  are  optimal  for  t  =  0  mod  9,  and 
asymptotically  optimal  for  other  t.  The  computer  search 
yields  optimal  lattice  interleavers  A(t,  3}  for  t  ^  180.  We 
conjecture  that  for  all  higher  values  of  t ,  optimal  lattice  inter¬ 
leavers  may  be  obtained  from  our  construction. 

For  r  =  4,  we  construct  lattice  interleavers  for  all  t.  and 
compute  their  5-dispersion.  Although  we  do  not  have  lower 
bounds  in  this  case,  we  conjecture  that  these  lattice  inter¬ 
leavers  are  optimal,  except  for  t  =  4, 5, 8,  53,  70.  The  com¬ 
puter  search  confirms  this  conjecture  up  to  t  =  221.  For  higher 
values  of  r,  we  exhibit  certain  infinite  families  of  lattice  inter¬ 
leavers,  such  as  A(rk,  r)  and  A(r2k,  r)  for  all  k  €  Z+,  and  com¬ 
pute  the  interleaving  degree  in  each  case.  These  families  make 
it  possible  to  establish  general  asymptotic  results  for  large  r. 

We  also  consider  interleaving  schemes  for  an  alternative 
cluster  connectivity  model.  Namely,  we  assume  that  two  el¬ 
ements  in  an  array  are  connected  if  they  are  adjacent  hor¬ 
izontally,  vertically,  or  diagonally.  We  show  that  there  is 
a  tight  relation  between  interleaving  schemes  for  this  connec¬ 
tivity  model  and  interleaving  schemes  for  the  standard  hori¬ 
zontal/vertical  connectivity  model. 

Finally,  we  consider  the  following  problem:  What  is  the 
largest  shape  S  C  Z2  such  that  the  tristance  between  any  three 
points  of  S  is  at  most  t?  Our  solution  to  this  problem  leads  to 
lower  bounds  on  the  interleaving  degree  of  A(t,  2)  for  general 
(nonlattice)  interleavers.  These  bounds  improve  substantially 
upon  the  earlier  results  of  Blaum,  Bruck,  and  Farell  [1] . 
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Abstract  —  This  paper  provides  an  information- 
theoretic  perspective  on  the  use  of  transmit  antenna 
arrays  when  the  transmitter  has  imperfect  channel 
feedback.  The  gains  obtained  are  found  to  be  substan¬ 
tial,  in  contrast  with  the  meager  gains  due  to  feedback 
reported  in  previous  work  on  single  antenna  systems. 

I.  Introduction 

Antenna  arrays  at  the  transmitter  are  widely  recognized  as 
an  effective  means  of  improving  the  capacity  and  reliability 
of  a  wireless  communication  link.  There  are  two  key  tech¬ 
niques  that  have  been  proposed  in  the  literature  for  exploiting 
transmit  antenna  arrays:  space-time  coding,  which  requires  no 
knowledge  of  the  spatial  channel  on  the  part  of  the  transmit¬ 
ter,  and  transmit  beamforming  techniques,  which  assume  that 
the  transmitter  has  accurate  knowledge  of  the  channel  through 
feedback  from  the  receiver.  For  a  typical  time- varying  channel, 
however,  the  feedback  available  to  the  transmitter  will  be  of 
intermediate  quality,  and  one  would  expect  that  the  transmit¬ 
ter  strategy  in  such  situations  would  be  some  blend  of  space- 
time  coding  and  beamforming.  Our  purpose  in  this  paper  is  to 
make  this  intuition  precise  by  providing  information-theoretic 
insights  into  the  appropriate  transmitter  strategies  when  the 
channel  feedback  available  to  the  transmitter  is  imperfect. 

II.  Problem  Statement 

It  is  assumed  that  the  transmit  antenna  has  M  elements,  and 
that  the  receive  antenna  has  a  single  element.  Consider  a  dis¬ 
crete  time  system,  where  the  channel  coefficients  from  the  M 
transmit  elements  to  the  receive  element  at  time  t  are  denoted 
by  the  M  x  1  complex  vector  h(t).  We  consider  the  following 
abstraction  to  model  partial  knowledge  of  the  channel  at  the 
transmitter. 

Problem  Set-Up:  The  transmitter  knows  that  the  channel 
h  has  a  complex  Gaussian  distribution  with  mean  pi  and  co- 
variance  E,  denoted  by  A f{pi,  E).  The  input  to  the  channel  is 
x.  The  receiver  knows  h,  and  receives 

y  —  x  h  +  n 

where  n  ~  Af(0,a2)  is  circular  complex  Gaussian  noise  with 
variance  cr2/ 2  per  dimension. 

Problem:  What  is  the  input  distribution  p(x)  that  maximizes 
the  mutual  information  I{x\y),  subject  to  £l{||x||2}  <  P. 

The  preceding  abstraction  can  be  related  to  a  specific  model 
for  channel  feedback  considered  recently  in  the  literature 
[1],  for  which  the  maximizing  input  distribution  achieves  the 
Shannon  capacity  of  the  forward  link. 

!This  work  was  supported  by  Motorola  under  the  University 
Partnerships  in  Research  Program. 


Upamanyu  Madhow 
Department  of  Electrical  and 
Computer  Engineering 
University  of  California 
Santa  Barbara,  CA,  93106  USA 
e-mail:  madhow@ece.ucsb.edu 

It  can  be  shown  [2]  that  the  maximizing  input  distribution 
is  complex  special  Gaussian,  x  ~  Af(0,Q).  The  optimization 
problem  is  now  one  of  finding  the  optimum  choice  of  the  co- 
variance  matrix  Q°  maximizing  the  mutual  information  for 
power  constraint  P,  and  the  optimization  problem  can  be  re¬ 
stated  as  follows: 

maxEh  jlog  +  lj  |  (1) 

subject  to  the  power  constraint  trace{Q)  =  P,  where  a2  is 
variance  of  the  additive  circular  complex  Gaussian  noise.  The 
expectation  in  (1)  is  computed  using  the  Af(pi,  E)  distribution 
for  h. 

III.  Overview  of  Results 

Presently,  the  solution  to  the  optimization  problem  in  (1) 
for  the  general  form  of  h  ~  Af(pi,  E)  is  not  known.  In  this 
work,  the  optimum  distribution  is  characterized  in  the  follow¬ 
ing  two  cases: 

1.  Mean  Feedback:  In  this  case,  the  channel  distribution 
is  modeled  at  the  transmitter  as  h  ~  Af(ti,al),  so  that 
the  feedback  provides  noisy  information  regarding  the 
current  channel  realization.  It  is  shown  that  the  opti¬ 
mum  solution  is  to  use  beamforming  along  pi  (Q  is  unit 
rank)  when  the  feedback  SNR  is  larger  than  a  threshold, 
and  to  use  M-fold  diversity  (Q  is  full  rank)  otherwise. 
In  the  latter  case,  the  most  power  is  put  in  the  direc¬ 
tion  pi,  while  the  remaining  M  —  1  orthogonal  directions 
receive  equal  (but  lower)  powers. 

2.  Covariance  Feedback:  The  channel  distribution  known 
to  the  transmitter  is  h  ~  Af{ 0,  E).  This  models  a  situ¬ 
ation  in  which  the  channel  may  be  varying  too  rapidly 
for  the  feedback  to  give  an  accurate  estimate  of  the  cur¬ 
rent  channel  value.  However,  the  relative  geometry  of 
the  propagation  paths  changes  more  slowly,  and  is  re¬ 
flected  in  the  covariance  matrix  E.  The  optimum  solu¬ 
tion  here  is  shown  to  consist  of  independent  Gaussian 
inputs  along  (a  subset  of)  the  M  eigenvectors  of  E.  The 
solution  resembles  water  pouring,  in  that  eigenvectors 
corresponding  to  larger  eigenvalues  receive  more  power. 

References 

[1]  G.  Caire  and  S.  Shamai,  On  the  capacity  of  some  channels  with 
channel  state  information,  IEEE  Transactions  on  Information 
Theory ,  vol.  45,  pages  2007-2019,  September  1999. 

[2]  I.  E.  Telatar,  Capacity  of  multi-antenna  Gaussian  channels, 
Tech.  Rep.  BL0112170-950615-07TM,  AT&T  Bell  Labs,  1995. 


0-7803-5857-0/00/$!  0.00  ©2000  IEEE. 


312 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Space-Time  Autocoding:  Arbitrarily  Reliable  Communication 

in  a  Single  Fading  Interval 

Thomas  L.  Marzetta,  Bertrand  Hochwald  and  Babak  Hassjbi 
Mathematical  Sciences  Center 
Lucent  Technologies 
600  Mountain  Avenue 
Murray  Hill,  NJ  07974 

e-mail:  {tlm,hochwald,hassibi}Qresearch. bell-labs . com 


Abstract  —  Prior  treatments  of  space-time  commu¬ 
nications  in  Rayleigh  flat  fading  generally  assume  that 
channel  coding  covers  either  one  fading  interval — in 
which  case  there  is  a  nonzero  “outage  capacity” — 
or  multiple  fading  intervals — in  which  case  there  is 
a  nonzero  Shannon  capacity.  However,  we  establish 
conditions  under  which  channel  codes  span  only  one 
fading  interval  and  yet  are  arbitrarily  reliable.  In 
short,  space-time  signals  are  their  own  channel  codes. 
We  call  this  phenomenon  space-time  autocoding,  and 
the  accompanying  capacity  the  space-time  autocapac¬ 
ity. 

Let  an  M- transmitter-antenna,  iV-receiver- antenna 
Rayleigh  flat  fading  channel  be  characterized  by  an 
M  x  N  matrix  of  independent  propagation  coeffi¬ 
cients,  distributed  as  zero-mean,  unit-variance  com¬ 
plex  Gaussian  random  variables.  This  propagation 
matrix  is  unknown  to  the  transmitter,  remains  con¬ 
stant  during  a  T-symbol  coherence  interval,  and  there 
is  a  fixed  total  transmit  power.  Let  the  coherence  in¬ 
terval  and  number  of  transmitter  antennas  be  related 
as  T  =  f3M  for  some  /3.  AT  x  M  matrix-valued  signal, 
associated  with  R  T  bits  of  information  for  some  rate 
R  is  transmitted  during  the  T-symbol  coherence  inter¬ 
val.  Then  there  is  a  positive  space-time  autocapacity 
Ca  such  that  for  all  R  <  Ca,  the  block  probability  of 
error  goes  to  zero  as  the  pair  (T,  M)  ->  oo  such  that 
T/M  =  j3.  The  autocoding  effect  occurs  whether  or 
not  the  propagation  matrix  is  known  to  the  receiver, 
and  Ca  —  iVlog(l  +  p)  in  either  case  independently  of 
j3,  where  p  is  the  expected  SNR  at  each  receiver  an¬ 
tenna.  Lower  bounds  on  the  cutoff  rate  derived  from 
random  Unitary  Space-Time  signals  suggest  that  the 
autocoding  effect  manifests  itself  for  relatively  small 
values  of  T  and  M.  For  example  within  a  single  coher¬ 
ence  interval  of  duration  T  =  16,  for  M  =  7  transmitter 
antennas  and  N  —  4  receiver  antennas,  and  an  18  dB 
expected  SNR,  a  total  of  80  bits  (corresponding  to 
rate  R  =  5)  can  theoretically  be  transmitted  with  a 
block  probability  of  error  less  than  10~9,  all  without 
any  training  or  knowledge  of  the  propagation  matrix. 

A  complete  copy  of  this  paper  is  available  on  the 
web  at  http://mars.bell-labs.com. 
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Abstract  —  Sufficient  conditions  to  ensure  QAM 
space-time  codes  achieve  full  space  diversity  in  quasi¬ 
static  fading  channel  are  presented.  The  conditions 
are  on  code  words  or  generator  matrices  instead  of 
on  every  code  word  pair.  This  greatly  simplifies  the 
construction  of  full  space  diversity  codes. 

I.  Introduction 

For  wireless  communication,  the  design  goal  of  so  called 
“space-time”  codes  [1]  is  to  take  advantage  of  both  the  spa¬ 
tial  diversity  provided  by  multiple  antennas  and  the  temporal 
diversity  available  with  time-varying  fading. 

In  quasi-static  Rayleigh  fading  channel,  in  order  for  a  space- 
time  code  to  achieve  full  space  diversity,  the  rank  of  every  code 
symbol  difference  matrix  need  to  be  full  rank  over  complex 
number  field.  However,  the  code  is  not  linear  over  complex 
number  field.  This  discrepancy  causes  a  serious  obstacle  in  the 
design.  The  paper  by  Hammons  and  El  Gamal  [2]  represents 
an  important  first  step  to  bridge  this  discrepancy  by  providing 
a  binary  rank  criteria  for  binary  BPSK  codes  and  Z4  QPSK 
codes  to  ensure  full  space  diversity. 

We  provide  a  theory  for  the  design  of  space-time  codes 
in  quasi-static  Rayleigh  fading  channel  with  higher  order  of 
constellation  (22*  QAM)  [3].  It  includes  the  BPSK  binary 
rank  criterion  in  [2]  as  a  special  case.  For  QPSK  constellation, 
it  is  applicable  to  GF(4)  codes  instead  of  Z4  codes. 

Applications  of  the  theory,  such  as  analysis  of  existing 
space-time  codes,  constructions  of  new  space-time  codes  from 
traditional  codes  and  turbo  codes  will  be  presented.  Only  the 
main  theorems  are  given  in  this  abstract. 

II.  E0-Rank  Criterion 

The  full  space  diversity  rank  criteria  developed  in  [3]  are  for 
codes  defined  on  the  ring  Z2», (j),  the  ring  Z2*  adjoined  with 
the  element  j  which  satisfies  j2  =  ©1.  In  the  sequel,  ©  is  used 
to  denote  the  modulo  2k  addition. 

Definition  1  (Linear  Z 2k(j)  Code  -with  Translation 
Mapping)  A  linear  Z2k{j)  codeC  is  a  set  of  code  words  which 
form  an  additive  group.  Each  code  word  J  is  an  Nc  by  Lt  ma¬ 
trix  with  elements  in  the  ring  Z2k(j).  Each  code  word  matrix 
J  is  mapped  to  a  complex  code  symbol  matrix  D  by  the  trans¬ 
lation,  Di{j)  =  Ji(j)  —  ((2fc  —  l)/2  +  j(2k  —  l)/2),  on  the  el¬ 
ement  of  ith  column  and  jth  row  for  all  i  and  j.  It  results  in 
a  22k  QAM  constellation. 

Definition  2  (E0-Coefficients)  Coefficients,  ai,  a2,  ..., 
ah,  in  Z 2*0’)  are  said  to  be  H0-coefficients  if  there  exists  t* 
such  that  ai •  +  bi ♦  is  odd,  where  a,-  ©  jbi-  —  a,- . 

lThis  work  was  supported  by  National  Science  Foundation  under 
Grant  NCR-9706372. 
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Definition  3  (Column  E0-Rank)  A  matrixV  over  the  ring 
Z2 k(j)  has  column  H0-rank  L  if  L  is  the  maximum  number  of 
column  vectors  ofV ,  such  that 

L 

3V={Ril,...,YiJ,  0a,3,  ^0, 

1=1 

for  any  Ho -coefficients,  £*1,0:2, ...  ,ai. 

The  row  E0-rank  can  be  similarly  defined.  Since  column 
E0-rank  and  row  E0-rank  are  equal  [3],  they  are  called  E0- 
rank. 

Definition  4  (Full  E0-Rank)  An  m  by  n  matrix  V  over 
ring  Z2k(j)  is  said  to  be  of  full  E  0-rank  if  it  has  H0-rank  equal 
to  the  minimum  of  m  and  n. 

The  sufficient  conditions  on  code  words  are  given  first. 

Theorem  1  (E0-Rank  Criterion)  Let  C  be  a  linear  Z 2k(j) 
code  with  translation  mapping  to  22k  QAM  constellation.  If 
every  nonzero  code  word  J  €  C  has  full  H0-rank,  then  C 
achieves  full  space  diversity. 

For  linear  codes,  the  conditions  can  be  translated  into  the 
conditions  on  the  generator  matrices. 

Theorem  2  Let  C  be  a  linear  Z2k(j)  code.  The  ith  column 
of  the  code  word  matrix  is  defined  as 

Ji  =  Gil,  (1) 

where  I  is  the  information  sequence  in  Z2k(j),  Gi  is  the 
generator  matrix  for  ith  antenna.  If  for  all  E0- coefficients, 
01,02, . . .  ,och 1 ,  and  for  all  nonzero  information  sequence,' I , 

(©a.GiJf/O,  (2) 

then  V  nonzero  J  6  C,  J  is  of  full  E0  -rank.  Thus,  the  code 
achieves  full  space  diversity. 
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Abstract  —  An  EM-based  algorithm  is  introduced 
for  decoding  space-time  trellis  codes  without  as¬ 
suming  channel  knowledge.  Its  complexity  is  much 
smaller  than  a  direct  evaluation  of  the  log-likelihood 
function,  and  simulation  results  indicate  it  receiver 
achieves  a  performance  close  to  that  of  a  receiver  that 
knows  the  channel  perfectly. 

I.  Introduction  and  System  Model 

Tarokh,  Seshadri  and  Calderbank  recently  proposed  trellis- 
based  space-time  codes  [1]  which  combine  signal  processing 
at  the  receiver  and  coding  appropriate  to  multiple  transmit 
antennas.  These  so-called  space-time  codes  perform  well  in 
slowly-fading  channels,  assuming  perfect  channel  state  infor¬ 
mation  (CSI)  at  the  receiver.  With  the  presence  of  chan¬ 
nel  mismatch,  however,  system  performance  suffers  a  signif¬ 
icant  degradation  [2],  In  this  paper  we  look  at  the  prob¬ 
lem  of  maximum-likelihood  sequence  estimation  for  space- 
time  coded  systems  without  assuming  channel  knowledge.  An 
expectation-maximization  (EM)  algorithm  [3]  is  derived  for 
the  sequence  estimation  problem  and  is  shown  by  simulations 
to  perform  close  to  the  performance  of  a  maximum  likelihood 
decoder  that  assumes  perfect  CSI. 

We  consider  N  transmit  and  M  receive  antennas.  Data 
blocks  of  length  L  are  encoded  by  a  space-time  encoder.  The 
transmitted  code  block  can  be  described  by  a  matrix  D,  whose 
entry,  djn,  is  the  complex  symbol  transmitted  by  the  n-th  an¬ 
tenna  during  the  Z-th  symbol  time  and  whose  row-vectors  are 
denoted  by  Dj.  The  fading  channel  between  the  transmit  and 
receive  antenna  arrays  is  described  by  a  matrix  T  whose  en¬ 
try  7 ij  denotes  the  complex,  Gaussian,  fading  gain  in  the  path 
from  the  i-th  transmit  to  the  j-th  receive  antenna.  We  assume 
the  fading  processes  of  different  paths  (transmit  and  receive 
antenna  pairs)  are  independent.  Its  column-vectors  are  de¬ 
noted  by  Tj,  which  represents  the  vector  of  fading  coefficients 
viewed  by  the  j-th  receive  antenna.  The  complex  matched- 
filter  outputs  over  the  length- L  transmitted  block  at  each  of 
the  M  receive  antennas  is  represented  by  a  matrix  Y,  whose 
entry  yij  denotes  the  output  at  jth  antenna  at  l  time  instant: 
Y  =  DT  +  N,  where  A f  is  the  AWGN  term.  We  denote  the 
column- vectors  of  Y  by  Y_,-. 

II.  The  EM-Based  Receivers 

To  apply  the  EM  algorithm,  we  choose  the  fading  parame¬ 
ter  vector  rj°  as  the  missing  data.  Thus  the  expectation  step 
of  EM  algorithm  yields 

L  M 

Q(D|D*)  =  £  [«  [Wi DifJ)  -  3D, AjD?]  , 

<=i  j= l 

where 

fJ=(p*)-D‘  +  §25)'  (!>*)•  Y„ 


Figure  1:  The  EM  and  “genie”  decoders:  N  =  2,  M  = 
2,  L  —  128,  8-state  code,  4  pilot  symbols,  3  iterations 

=  I  -  ((Dfcr  D*  +  -L-)  1  (D*y  Dfc  +  r}(rjr 

The  maximization  step  yields 

Dfc+1  =  arg  max  [»  (jJjDjfJ)  -  iD,fljDf]  . 

i=i  j= l  L 

III.  Performance 

We  use  the  8-state  QPSK  code  introduced  in  [1]  to  study 
the  performance  of  the  EM-based  algorithm.  Pilot  symbols 
are  inserted  into  the  data  stream  to  initialize  the  algorithm. 
The  maximization  step  of  the  EM  algorithm  is  efficiently  per¬ 
formed  using  the  Viterbi  algorithm.  Figure  1  shows  simula¬ 
tion  results  for  the  frame-error  probability  for  the  EM-based 
algorithm,  the  “genie”  receiver  that  assumes  perfect  channel 
knowledge,  and  a  receiver  that  first  estimates  the  channel. 
In  the  simulations  for  the  channel  mismatch  case,  eight  pilot 
symbols  were  inserted  in  each  frame  to  estimate  the  channel. 
It  is  clear  that  the  EM  decoder  performs  close  to  the  “genie 
bound”,  while  in  the  channel  mismatch  case,  about  ldB  loss 
occurs  at  a  frame  error  rate  of  0.1.  At  higher  SNR,  perfor¬ 
mance  loss  becomes  even  larger. 
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Abstract  —  We  analyze  phase  trajectories  of  the 
turbo  decoding  algorithm  as  a  function  of  the  signal- 
to-noise  ratio  (SNR).  We  prove  the  existence  of  fixed 
points  not  only  at  asymptotically  high  SNRs  but  also 
at  asymptotically  low  SNRs.  Fixed  points  at  practical 
SNRs  are  empirically  divided  into  two  classes:  inde¬ 
cisive  fixed  points  which  usually  lead  to  numerous  er¬ 
roneous  decisions  and  unequivocal  fixed  points  which 
usually  correspond  to  correct  decisions.  The  water¬ 
fall  region  in  the  performance  curve  of  turbo  decoding 
is  characterized  as  the  region  of  transition  from  con¬ 
vergence  to  indecisive  fixed  points  to  convergence  to 
unequivocal  fixed  points. 

I.  Introduction 

We  consider  classical  turbo  codes,  transmitted  over  an  addi¬ 
tive  white  Gaussian  noise  channel  using  binary  phase-shift¬ 
keying  modulation.  The  corresponding  turbo  decoding  algo¬ 
rithm  can  be  viewed  as  a  discrete  dynamical  system  [3].  This 
dynamical  system  iteratively  updates  two  probability  densi¬ 
ties  on  information  bits  —  commonly  known  as  the  extrinsic 
information  —  provided  by  the  two  constituent  decoders  of 
the  turbo  decoding  algorithm. 

As  a  dynamical  system,  the  turbo  decoding  algorithm  can 
have  a  variety  of  phase  trajectories.  A  phase  trajectory  may 
converge  to  a  fixed  point,  reach  a  well-defined  invariant  set, 
or  simply  wander  in  the  high-dimensional  space  of  extrinsic 
information.  At  present,  preciously  little  is  known  about  the 
characteristics  of  these  phase  trajectories.  For  example,  in 
many  cases,  the  turbo  decoding  algorithm  does  not  converge 
after  a  fixed  number  (say  18)  of  iterations.  Is  it  possible  that 
in  the  majority  of  such  cases  the  decoding  algorithm  actually 
converges,  albeit  only  after  a  large  number  of  iterations?  Or 
is  the  opposite  true:  in  the  majority  of  such  cases,  the  decod¬ 
ing  will  never  converge.  It  has  been  observed  that  the  turbo 
decoding  algorithm  always  converges  at  high  SNRs.  What 
happens  at  (asymptotically)  low  SNRs:  Does  the  algorithm 
converge  or  does  it  wander  ad  infinitum?  These  axe  some  of 
the  basic  questions  answered  in  this  work. 

II.  Fixed  Points  at  Asymptotic  SNRs 
Using  a  set  of  sufficient  conditions  provided  by  Richardson  [3], 
we  show  [1]  that  at  asymptotically  low  SNRs,  with  high  prob¬ 
ability,  the  turbo  decoding  algorithm  has  a  unique  fixed  point. 
The  extrinsic  information  that  corresponds  to  this  fixed  point 
is  close  to  the  uniform  distribution  on  information  bits.  That 
is,  the  fixed  point  votes  almost  equally  in  favor  of  the  two 
possible  values  for  each  transmitted  information  bit. 

On  the  other  hand,  we  show  that  at  asymptotically  high 
SNRs,  with  high  probability,  the  turbo  decoding  algorithm 
has  fixed  points  that  correspond  to  the  transmitted  codeword. 
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Moreover,  starting  from  unbiased  initialization,  the  turbo  de¬ 
coding  algorithm  will  converge  to  one  of  these  fixed  points. 
The  derivation  of  this  result  indicates  that  extrinsic  informa¬ 
tion  corresponding  to  such  fixed  points  is  concentrated  on  the 
transmitted  information  bits. 

III.  Fixed  Points  at  Practical  SNRs 
The  existence  of  certain  fixed  points  at  asymptotic  SNRs 
raises  interesting  questions.  Does  the  turbo  decoding  algo¬ 
rithm,  starting  from  unbiased  initialization,  converge  to  these 
fixed  points?  If  so,  what  are  the  threshold  values  of  SNR 
beyond  which  the  turbo  decoding  algorithm  converges? 

To  answer  these  questions,  we  performed  extensive  simu¬ 
lations.  Empirically,  we  found  that  the  turbo  decoding  algo¬ 
rithm  converges  to  two  types  of  fixed  points:  indecisive  fixed 
points  and  unequivocal  fixed  points.  The  algorithm  converges 
to  indecisive  fixed  points  for  SNRs  that  are  below  the  water¬ 
fall  region,  and  to  unequivocal  fixed  points  for  SNRs  above 
the  waterfall  region.  The  empirically  observed  characteristics 
of  indecisive  and  unequivocal  fixed  points  match  closely  the 
characteristics  predicted  by  our  analysis  for  asymptotically 
low  and  asymptotically  high  SNRs,  respectively. 

For  SNRs  in  the  waterfall  region,  the  decoding  algorithm 
may  or  may  not  converge,  and  in  some  cases,  the  phase  tra¬ 
jectory  may  become  quasi-period  or  periodic. 

IV.  Continuation  of  Fixed  Points 

For  sufficiently  long  turbo  codes,  we  can  treat  the  turbo  de¬ 
coding  algorithm  as  a  single-parameter  dynamical  system,  pa¬ 
rameterized  (approximately)  by  the  SNR.  This  allows  us  to 
trace  the  movement  of  fixed  points  (more  precisely,  obtain  the 
equilibrium  curves  of  fixed  points)  as  the  SNR  is  changed. 

The  equilibrium  curves,  parameterized  (approximately)  by 
the  SNR,  reveal  that  unequivocal  fixed  points  barely  move  as 
the  SNR  is  changed  from  very  high  to  very  low  values.  How¬ 
ever,  starting  from  the  very  low  values,  indecisive  fixed  points 
move  substantially  as  the  SNR  is  increased  while  becoming  less 
and  less  stable.  Ultimately,  for  SNRs  in  the  waterfall  region, 
indecisive  fixed  points  bifurcate  and  disappear.  All  three  types 
of  bifurcation,  studied  in  classical  bifurcation  theory  [2],  occur 
in  turbo  decoding.  This  explains  the  quasi-periodic  and  peri¬ 
odic  behavior  of  the  phase  trajectories  in  the  waterfall  region. 
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Abstract  —  We  prove  the  existence  of  thresholds  for 
turbo  codes  [1]  and  we  prove  concentration  of  the  per¬ 
formance  of  turbo  codes  within  the  ensemble  deter¬ 
mined  by  the  random  interleaver.  In  effect,  we  show 
that  the  results  obtained  in  [2]  and  [3]  for  low-density 
parity-check  codes  extend  to  turbo  codes.  The  main 
technical  innovation  is  to  rigorously  show  that  depen¬ 
dence  of  output  extrinsic  information  on  input  priors 
decays  with  distance  along  the  trellis.  In  an  infinitely 
long  turbo  code  the  densities  of  the  extrinsic  informa¬ 
tion  fulfill  a  certain  symmetry  condition  which  we  call 
the  consistency  condition.  This  condition  provides  the 
basis  for  an  efficient  Monte-Carlo  algorithm  for  the 
determination  of  thresholds  for  turbo  codes.  Thresh¬ 
olds  of  all  symmetric  parallel  concatenated  codes  of 
memory  up  to  6  have  been  determined. 


I.  Introduction 


We  determine  the  asymptotic  (in  length)  performance  of 
turbo-codes  under  iterative  decoding.  The  analysis  is  based 
on  the  techniques  introduced  in  [2,  3,  4]  in  the  context  of  low- 
density  parity  check  (LDPC)  codes  extended  here  to  turbo 
codes. 

Assume  we  have  the  following  setup. 


1.  A  family  of  binary-input  output-symmetric  memoryless 
channels  ordered  by  physical  degradation  and  indexed 
by  a  real  parameter  <r,  e.g.,  the  class  of  binary  symmet¬ 
ric  channels  (BSC),  the  class  of  additive  white  Gaussian 
noise  channels  (AWGNC)  or  the  class  of  Laplace  chan¬ 
nels  (LC). 


2.  For  every  integer  n  we  define  an  ensemble  of  turbo  codes 
Cn  in  the  following  manner.  We  first  fix  the  two  ratio¬ 
nal  functions  G\{D)  =  and  Gi(D)  =  ^Tyj  which 

describe  the  recursive  convolutional  encoding  functions. 
For  x  £  {±1}  let  7, (a:),  «  =  1,2,  denote  the  correspond¬ 
ing  encoding  functions.  Then  for  a  given  permutation  ar 
on  n  letters  the  unpunctured  codewords  of  a  standard 
parallel  turbo  code  have  the  form  (x,  71  (x),  72(x(x))). 
We  will  assume  a  uniform  probability  distribution  on 
the  set  of  such  permutations. 


There  exists  a  threshold  <r*  with  the  property  that  if  n  is 
large  enough  then  for  almost  any  code  from  the  ensemble  Cn 
the  probability  of  bit  error  is  below  any  desired  level  if  trans¬ 
mission  takes  place  over  a  channel  with  <r  <  0*  and  the  bit 
error  probability  is  bounded  away  from  zero  if  we  transmit 
over  a  channel  with  a  >  <r*. 

We  use  a  result  from  the  theory  of  products  of  positive 
random  matrices  to  prove  that  dependencies  in  the  trellis  de¬ 
cay  with  distance.  This  implies  that  constituent  decoding  is 
essentially  local  in  the  trellis  and  can  be  appoximated  arbi¬ 
trarily  well  by  finite  window  turbo  decoding  [5].  Once  one 
restricts  to  windowed  decoding  the  proof  goes  through  much 
as  for  LDPC  codes:  An  edge  exposure  martingale  argument 


proves  concentration  of  performance  around  the  mean.  For 
any  fixed  number  of  iterations  the  graph  determining  output 
extrinsic  information  is  asymptotically  a  tree  with  high  prob¬ 
ability  so  the  mean  converges  to  the  performance  of  such  a 
tree.  By  taking  limits  one  obtains  the  corresponding  result 
for  non-windowed,  standard,  turbo  decoding. 

To  date  we  know  of  no  numerical  algorithm  to  calculate 
thresholds  which  has  efficiency  comparable  to  the  LDPC  code 
case.  To  determine  thresholds  we  simulate,  in  effect,  an  in¬ 
finitely  long  turbo  code.  If  P  is  the  distribution  of  the  priors 
then  the  one-sided  state  distributions,  usually  denoted  a  and 
,3  converge  to  steady  state  distributions  along  an  infinitely 
long  trellis.  The  output  extrinsic  information  is  determined 
by  the  these  steady  state  distributions  and  the  distribution  of 
the  channel  data.  We  use  Monte-Carlo  techniques  to  estimate 
the  various  distributions.  These  calculations  are  significantly 
improved  in  both  speed  of  convergence  and  accuracy  by  ex¬ 
ploiting  a  provable  symmetry  property  of  extrinisic  informa¬ 
tion  distributions  in  infinitely  long  turbo  codes.  Let  f(x)  be 
a  distribution  of  extrinsic  information  in  log-likelihood  repre¬ 
sentation  for  an  infinitely  long  turbo  code  assuming  the  all 
0  codeword.  Then  f(x)  =  f{—x)ex.  This  consistency  condi¬ 
tion  implies  that  f(x)  is  determined  by  the  distribution  of  |*|, 
which  is  much  easier  to  accurately  estimate  via  direct  simula¬ 
tion. 


m 

code 

<7* 

2 

(5,7) 

0.883 

3 

(11,13) 

0.93 

4 

(17,31) 

0.94 

5 

(31,45) 

0.94 

6 

(41, 107) 

0.941 

Table  1:  The  highest  threshold  of  standard  parallel 
concatenated  codes  of  rate  A  up  to  memory  6  for  the 
AWGNC:  y  =  x  ■+■  n  where  z  =  ±1  and  n  is  <7.^(0, 1). 


References 

[1]  C.  Berrou,  A.  Glavieux,  and  P.  Thitimajshima,  “Near  Shannon 
limit  error-correcting  coding  and  decoding,”  in  Proceedings  of 
ICC’9S,  (Geneve,  Switzerland),  pp.  1064-1070,  May  1993. 

[2]  T.  Richardson  and  R.  Urbanke,  “The  Capacity  of  Low-Density 
Parity-Check  Codes  under  Message  Passing  Decoding,”  submit¬ 
ted  IEEE  IT,  1999. 

[3]  T.  Richardson,  A.  Shokrollahi and  R.  Urbanke,  “Design  of  Prov- 
ably  Good  Low-Density  Parity-Check  Codes,”  submitted  IEEE 
IT,  1999. 

[4]  M.  Luby,  M.  Mitzenmacher,  A.  Shokrollahi  and  D.  Spielman, 
“Analysis  of  low  density  codes  and  improved  designs  using  ir¬ 
regular  graphs,”  in  Proceedings  of  the  SOth  Annual  ACM  Sym¬ 
posium  on  Theory  of  Computing ,  pp.  249-258, 1998. 

[5]  N.  Wiberg,  "Codes  and  Decoding  on  General  Graphs”, 
Linkoping  University,  S-581  83,  Linkoping,  Sweden,  1996. 


0-7803-5857-0/00/51  0.00  ©2000  IEEE. 


317 


IS  IT  2000,  Sorrento,  Italy,  June  25-30,2000 


Gaussian  Approximation  for  Sum-Product  Decoding  of  Low-Density 

Parity-Check  Codes 


Sae- Young  Chung 
Laboratory  for  Information  and 
Decision  Systems,  M.I.T. 
e-mail:  sychungQlids.mit.edu 


Rudiger  Urbanke 
Communications  Theory  Lab, 
EPFL,  Lausanne,  Switzerland 
e-mail:  Rudiger. UrbankeQepf  1 . ch 


Thomas  J.  Richardson 
Lucent  Technologies, 
Murray  Hill,  NJ 
e-mail:  tjrQlucent.com 


Abstract  —  We  use  a  Gaussian  approximation  (GA) 
for  analyzing  the  sum-product  algorithm  for  low- 
density  parity-check  (LDPC)  codes  and  memoryless 
binary-input  continuous-output  additive  white  Gaus¬ 
sian  noise  (AWGN)  channels.  This  simplification  al¬ 
lows  us  to  calculate  the  threshold  quickly  and  to  un¬ 
derstand  the  behavior  of  the  decoder  better.  We  have 
also  designed  high  rate  LDPC  codes  using  the  GA 
that  have  thresholds  less  than  0.05  dB  from  the  Shan¬ 
non  limit. 

I.  Introduction 

For  many  interesting  channels  and  iterative  decoders, 
LDPC  codes  exhibit  a  threshold  phenomenon:  an  arbitrary 
small  bit  error  probability  can  be  achieved  if  the  noise  level 
is  smaller  than  a  certain  threshold  and  the  probability  of  bit 
error  is  larger  than  a  positive  constant  for  a  noise  level  above 
the  threshold  as  the  block  length  tends  to  infinity  [1]. 

In  this  paper,  we  present  a  simple  method  to  estimate  the 
thresholds  of  randomly  constructed  irregular  LDPC  codes  for 
memoryless  binary-input  continuous-output  AWGN  channels 
under  sum-product  decoding.  This  method  is  based  on  ap¬ 
proximating  densities  of  log-likelihood  ratio  (LLR)  messages 
as  Gaussian  mixtures.  We  assume  for  each  variable  node  the 
graph  is  a  tree  up  to  a  certain  depth  as  validated  by  the  gen¬ 
eral  concentration  theorem  [2]. 

II.  Gaussian  Approximation 

If  all  incoming  messages  of  a  variable  node  are  Gaus¬ 
sian,  then  the  resulting  extrinsic  information  distribution  is 
also  Gaussian  because  it  is  the  sum  of  independent  Gaussian 
random  variables.  Numerical  results  using  density  evolution 
(DE)  [1]  show  that  the  extrinsic  information  distributions  from 
both  variable  and  check  nodes  are  very  close  to  Gaussian  even 
though  the  inputs  are  not.  From  now  on,  we  assume  all  ex¬ 
trinsic  information  distributions  are  Gaussian.  By  enforcing 
the  consistency  condition  [2]  at  each  iteration,  we  can  greatly 
improve  the  accuracy  of  the  approximation  and  reduce  the  DE 
problem  to  a  one-dimensional  one. 

Let  A(x)  =  and  p(x)  =  P'x'~l  be  the  de¬ 

gree  sequences  for  the  variable  and  check  nodes,  respectively. 
For  0  <  s  <  oo  and  0  <  t  <  oo,  we  define  f(s,  t)  as 

dr 

/(M)  =  Yp^ 

3=2 

where  tp(x)  is  defined  by 

<P(x)  =  [  7b/Rtanhfe'^du  ifx>0 
\  0  if  x  =  0. 


The  message  update  rule  becomes  now  ti  =  f(s,  where 
s  =  muo  is  the  mean  of  uo  and  ti  is  the  ensemble  mean  of  the 
output  messages  of  check  nodes  at  i-th  iteration.  The  initial 
value  to  is  0.  Note  that  since  t\  =  f(s,  0)  >  0  for  s  >  0,  the 
iteration  will  always  start. 

We  define  the  threshold  s'  as  the  infimum  of  all  s  in  R+ 
such  that  ti  ( s )  converges  to  oo  as  l  ->  oo.  By  finite  induction, 
we  conclude  that  if  s  >  s',  ti(s)  converges  to  oo.  The  following 
lemma  shows  an  alternative  interpretation  of  the  threshold. 

Lemma  1  ti(s)  will  converge  to  oo  iff 

t<f(s,t),  VteR+.  (1) 

As  in  the  case  of  DE  [2]  we  can  derive  a  stability  condition: 

Theorem  1  If  X2  <  A2,  then  t  will  converge  to  infinity  if 
the  initial  value  of  t  is  large  enough.  If  A2  >  A2,  then  t 
cannot  converge  to  infinity  for  any  initial  value  of  t,  where 

A5  =  el'**/n£aCj-l  )"• 

For  this  model  it  is  even  possible  to  derive  expressions  for 
the  convergence  rate  of  the  probability  of  error  Pi.  In  partic¬ 
ular,  for  A2  <  A2,  Pi  behaves  asymptotically  as  the  following 
as  l  -»■  00: 


where  a  and  b  are  constants  that  depend  on  A (x),p(x)  and  s. 
These  predictions  fit  well  with  the  actual  results  using  DE. 

III.  Optimization  of  Degree  Sequences 

For  given  p(x)  and  rate,  we  can  find  optimal  A(x)  that 
maximizes  the  noise  threshold.  This  can  be  performed  by 
maximizing  the  rate  subject  to  the  normalization  and  the  in¬ 
equality  constraint  in  (1),  which  can  be  done  using  linear  pro¬ 
gramming.  Optimization  of  p(x)  can  be  done  similarly.  We 
show  when  we  consider  only  low  error  probability  regions,  the 
optimal  form  of  p(x)  is  concentrated  in  1  or  2  consecutive 
degrees.  We  have  successfully  optimized  degree  sequences  us¬ 
ing  these  methods  up  to  within  0.05  dB  from  the  Shannon 
limit  for  rates  greater  them  0.99.  Good  degree  sequences  were 
also  obtained  for  lower  rates.  Online  demonstration  of  de¬ 
gree  sequence  optimization  using  the  GA  and  more  results 
axe  available  at  http://truth.mit.edu/~sychung. 
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Abstract  —  In  this  paper,  we  introduce  a  simple 
technique  for  analyzing  the  iterative  decoder  that  is 
broadly  applicable  to  different  classes  of  codes  de¬ 
fined  over  graphs  in  certain  fading  as  well  as  AWGN 
channels.  The  technique  is  based  on  the  observation 
that  the  extrinsic  information  from  constituent  MAP 
decoders  is  well  approximated  by  Gaussian  random 
variables  when  the  inputs  to  the  decoders  are  Gaus¬ 
sian.  The  independent  Gaussian  model  implies  the 
existence  of  an  iterative  decoder  threshold  that  statis¬ 
tically  characterizes  the  convergence  of  the  iterative 
decoder.  Despite  the  idealization  of  the  model  and 
the  simplicity  of  the  analysis  technique,  the  predicted 
threshold  values  are  in  excellent  agreement  with  the 
waterfall  regions  observed  experimentally  in  the  lit¬ 
erature  when  the  code  word  lengths  are  large. 

I.  Introduction 

This  paper  is  based  on  a  simple  but  powerful  technique  orig¬ 
inally  developed  by  the  first  author  in  his  Ph.D.  thesis  [1] 
to  evaluate  the  convergence  characteristics  of  the  iterative 
decoder  for  various  graphical  codes.  Independently  and  at 
roughly  the  the  same  time  as  [1],  Richardson  and  Urbanke  [2] 
developed  a  rigorous  method  of  analysis  for  iterative  decod¬ 
ing  of  Gallager  low  density  parity  check  codes  (GLDPCCC). 
Their  approach  entails  computation  of  density  functions  as 
they  evolve  from  one  iteration  to  the  next.  The  analysis  tech¬ 
nique  proposed  in  this  paper  is  simpler  to  evaluate  than  the 
density  evolution  technique  and  provides  insights  into  the  de¬ 
coder  operation  that  would  be  difficult  to  extract  using  the 
density  evolution  approach.  Furthermore,  despite  the  ideal¬ 
ization  of  the  mathematical  model  and  the  simplicity  of  the 
analysis  technique,  the  close  agreement  between  its  predictions 
and  the  simulation  results  available  in  the  literature,  including 
[2],  is  striking. 

II.  Decoder  Convergence 

Iterative  decoding  on  graphs  can  be  viewed  as  a  multi-stage 
decoding  operation  where  soft  information  is  exchanged  be¬ 
tween  the  different  stages.  The  algorithm  performed  in  each 
iteration  can  be  either  the  sum-product  or  the  min-sum  algo¬ 
rithms  [3].  It  was  observed  in  [3]  that,  if  inputs  to  the  sum 
product  algorithm  are  independent  Gaussian  random  vari¬ 
ables,  then  the  output  can  be  tightly  approximated  by  a  Gaus¬ 
sian  random  variable.  The  independent  Gaussian  approxima¬ 
tion  allows  for  complete  characterization  of  the  turbo  decoder 
convergence  in  terms  of  a  single  parameter:  the  extrinsic  in¬ 
formation  signal-to-noise  ratio. 

In  this  paper,  we  only  consider  the  sum-product  algo¬ 
rithm.  Therefore,  we  assume  that  the  constituent  codes  are 
decoded  by  a  soft- input /soft-output  (SISO)  maximum  a  pos¬ 
teriori  (MAP)  decoder.  The  model  developed  in  [4]  is  intended 
to  cover  graphical  codes  that  enjoy  some  symmetry  in  their 


structure;  however,  with  minor  modifications  the  proposed 
technique  can  be  extended  to  handle  certain  irregular  codes. 
In  [4],  we  use  this  model  to  show  that  it  is  sufficient  to  char¬ 
acterize  the  extrinsic  information  SNR  input/output  relation 
of  the  basic  constituent  decoder(s)  to  determine  if  the  turbo 
decoder  will  converge  or  not  at  any  Eb/No  [4].  This  character¬ 
ization  is  generally  possible  via  simple  simulations.  We  only 
need  to  simulate  on  constituent  decoder,  assuming  symmetry, 
with  Gaussian  extrinsic  and  intrinsic  inputs  and  measure  the 
output  extrinsic  information  bit  error  rate. 

III.  Application  to  Different  Code 

CONSTRCUTIONS 

In  [4],  we  analyze  in  detail  the  effect  of  the  iterative  de¬ 
coder  convergence  characteristics  on  the  performance  of  var¬ 
ious  graphical  codes.  For  all  of  the  cases  considered,  the 
convergence  results  predicted  by  the  proposed  technique  are 
within  a  small  fraction  of  a  dB  from  the  simulation  results  re¬ 
ported  in  the  literature.  [4]  also  includes  an  interesting  asym¬ 
metric  parallel  concatenated  code  designed  based  on  conver¬ 
gence  considerations. 

IV.  Conclusions 

The  main  result  established  in  this  paper  is  that  the  perfor¬ 
mance  of  graphical  codes  in  the  low  SNR  region  is  governed 
by  the  convergence  characteristics  of  the  iterative  decoder  in¬ 
dependent  of  the  distance  spectrum  of  the  code.  Thus,  tradi¬ 
tional  optimization  of  the  code  parameters  with  respect  to  the 
distance  spectrum  will  not  in  general  improve  the  performance 
in  the  low  SNR  region.  The  simple  method  developed  in  this 
paper  to  analyze  the  iterative  decoder  convergence  is  based  on 
the  Gaussian  approximation  and  yields  very  accurate  results 
compared  with  the  literature. 
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Abstract  —  A  thresholding  method  for  reduction  of 
dimensionality  applied  to  test  statistics  of  an  M- ary 
composite  hypothesis  testing  problem,  with  the  max¬ 
imum  likelihood  (ML)  estimates  incorporated  instead 
of  the  true  parameters,  is  developed.  The  ML  esti¬ 
mates  are  obtained  from  training  sets  of  small  size. 
The  thresholding  method  selects  only  the  entries  in 
the  testing  vector  that  contain  a  large  amount  of  infor¬ 
mation  for  discriminating  among  M  hypotheses.  The 
information  measure  is  a  plug-in  version  of  the  rel¬ 
ative  entropy  with  one  of  two  distributions  known. 
The  method  is  promising  for  the  exponential  family. 
The  performance  of  the  test  with  a  reduced  number  of 
dimensions  is  analyzed  by  applying  a  theory  of  asymp¬ 
totic  expansions  of  integrals. 

I.  Modified  Test 

Consider  an  M- ary  hypothesis  testing  problem  with  popu¬ 
lations  modeled  to  belong  to  a  parametric  family  with  an  h- 
dimensional  vector  of  parameters  6  in  an  open  subset  0  C  Tih  ■ 
Assume  that  the  vectors  of  parameters  6m,meM  =  l,...,M, 
are  unknown  and  distinct.  Suppose  that  M  independent  sets 
<Sm,  one  for  each  population,  are  available  to  estimate  the  un¬ 
known  parameters.  Each  set  S,n  consists  of  a  collection  of  A, 
i.i.d.  realizations  of  a  random  vector  of  length  n  >  A  sampled 
from  the  m-th  parametric  distribution.  Maximum  likelihood 
(ML)  estimation  often  has  low  complexity  and  is  often  pre¬ 
ferred  for  practical  pattern  recognition  systems  (see  [1]). 

Independent  data  R  drawn  equiprobably  from  one  of  the 
M  populations  are  tested  using  the  composite  Bayes  test  with 
ML  estimates  in  the  test  statistics.  When  the  entries  in  the 
vectors  of  observations  are  independent,  the  test  is 

n 

m  =  arg  max  Y'' log  {p(R(l)  :  9m(l))\  ,  (1) 

m£Ai  z J  K  J 

1  =  1 

where  0m(2),  are  the  ML  estimated  parameters  obtained  us¬ 
ing  the  training  set  5m.  Tests  of  this  kind  are  known  as 
plug-in  tests  [1],  Plug-in  tests  with  ML  estimates  often  ex¬ 
hibit  degraded  performance  and  even  a  so-called  peaking  phe¬ 
nomenon,  which  results  from  nonoptimal  use  of  ML  estimates 
in  the  test  statistics. 

A  method  used  in  pattern  recognition  to  improve  perfor¬ 
mance  of  the  test  with  the  plug-in  test  statistics  is  to  apply 
a  method  of  dimensionality  reduction  [2,  3].  We  take  this 
approach  and  develop  a  hard  thresholding  method  to  select 
informative  variables  (features).  First,  define  a  null  hypoth¬ 
esis,  under  which  the  testing  data  have  a  parametric  distri¬ 
bution  from  the  same  family  as  the  hypothesized  populations 

'This  work  was  supported  in  part  by  Grant  DAAH04-95-1- 
0494,  by  Grant  N00014-98- 1-06-06,  and  by  the  Boeing  McDonnell 
Foundation. 


but  with  known  vector  of  h  parameters  ip  (a  design  vector- 
parameter).  The  distribution  of  the  null  hypothesis  can  be 
incorporated  in  the  test  statistics  by  applying  a  chain  rule. 
Further  the  number  of  dimensions  of  the  testing  vector  is  re¬ 
duced  by  using  a  thresholding  approach.  According  to  the 
method,  only  the  entries  in  the  testing  vector  that  contain  the 
most  information  for  discrimination  among  M  populations  are 
selected.  The  discrimination  information  is  measured  using 
the  following  rule 

d(§m{l),ip(l))  >  k,  (2) 

where  k  is  a  nonnegative  parameter,  called  thresholding  level, 
and  ef(-,-)  is  an  information  measure  between  two  distribu¬ 
tions.  Note  that  (2)  involves  the  distribution  of  the  null  hy¬ 
pothesis.  In  this  work  we  choose  to  be  a  plug-in  ver¬ 

sion  of  relative  entropy.  Invoking  the  null  hypothesis  and  the 
thresholding  rule  (2),  we  can  obtain  the  test 


m  —  arg  max 


£{ 

i=i  *• 


log 


p(R(l)  :  9m(l)) 
PW )  :  m) 


(3) 


where  /(.)  is  an  indicator  function. 

II.  Performance  Analysis 

We  analyze  the  performance  of  the  modified  test  in  (3)  by 
first  using  Monte-Carlo  simulations  and  then  by  applying  a 
theory  of  asymptotic  expansions  of  integrals.  If  A  is  a  pa¬ 
rameter  of  approximation,  the  moment  generating  function  of 
the  modified  test  statistic  (assume  M  =  2)  is  a  product  of 
(semidefinite  or  definite)  integrals  each  with  a  kernel  of  ex¬ 
ponential  type.  Under  conditions  stated  in  [4],  each  of  these 
integrals  can  be  asymptotically  approximated  to  an  arbitrary 
order  in  (1/A)  (we  consider  0(A-2)  )  by  applying  the  Mellin 
transform  method.  The  approximation  depends  on  the  loca¬ 
tion  of  the  true  parameters  in  parameter  space  relative  to  the 
solutions  of  the  equation  d{Qm(l),ip(l))  =  «,  2  =  1,  ...n,  [5]. 

The  results  are  applied  to  complex  Gaussian  models  that 
appear  in  automatic  target  recognition  problems  [5]. 
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Abstract  —  Principal  curves,  like  principal  compo¬ 
nents,  are  a  tool  used  in  multivariate  analysis  for  ends 
like  feature  extraction.  Defined  in  their  original  form, 
principal  curves  need  not  exist  for  general  distribu¬ 
tions.  The  existence  of  principal  curves  with  bounded 
length  and  a  learning  algorithm  for  such  curves  for 
any  distribution  that  satisfies  some  minimal  regular¬ 
ity  conditions  has  been  shown.  We  define  principal 
curves  with  bounded  turn,  show  that  they  exist,  and 
present  a  learning  algorithm  for  them. 

I.  Introduction 

Principal  component  analysis  is  a  widely  used  tool  in  multi¬ 
variate  data  analysis  for  purposes  such  as  dimension  reduction 
and  feature  extraction.  A  generalization  of  the  idea  of  princi¬ 
pal  components  to  principal  curves  was  introduced  by  Hastie 
and  Stuetzle  in  [2],  Principal  curves  by  their  definition  in 
[2],  however,  are  not  guaranteed  to  exist  for  any  distribution. 
Kegl  et.  al.  [3]  provided  a  new  definition  for  principal  curves 
with  bounded  length,  and  showed  that  such  curves  exist  for 
any  distribution  with  bounded  second  moment.  They  also  de¬ 
rive  a  learning  algorithm  for  such  curves.  Due  to  the  length 
constraint,  the  treatment  in  [3]  does  not  encompass  the  case 
of  classical  principal  component  analysis.  In  this  paper,  we 
penalize  the  turn  of  a  curve  instead  of  its  length,  and  look  for 
principal  curves  within  the  class  of  curves  of  bounded  total 
turn.  The  appeal  of  this  approach  consists  partly  in  the  fact 
that  principal  components  are  a  special  case  of  such  principal 
curves  wherein  the  total  turn  is  0.  We  define  principal  curves 
with  bounded  turn  and  show  that  they  exist  and  also  analyze 
an  algorithm  for  learning  such  curves.  Our  approach  to  the 
problem  follows  very  closely  that  in  [3], 

II.  Preliminaries  and  Notation 

Definition  1  A  curve  in  Rd  is  defined  as  a  continuous  func¬ 
tion  f  :  I  i— ►  R'1  where  I  is  an  interval  on  R  (possibly  infinite, 
but  a  closed  subset  of  R ). 

Consider  a  curve  f  and  a  point  x  £  Rd.  We  define  the 
projection  of  x  onto  /  and  the  distortion  due  to  this  projection 
in  the  natural  way.  For  a  random  variable  X,  we  define  the 
distortion  A (/)  of  curve  /  as  the  expected  distortion  due  to 
projection  of  X  on  to  /. 

Definition  2  Given  a  random  variable  X ,  we  say  that  f  is 
a  principal  curve  for  X  in  a  class  of  curves  C  if  f  £  C  and 
A (/)  =  infaec  A (g)  =  A'c 

We  define  the  turn  «(/)  of  a  curve  /  as  in  [l]  so  that  it 
generalizes  the  notion  of  total  integral  curvature  of  a  curve  to 
nonsmooth  curves  in  a  natural  way. 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  NYI  grant  IRI-9457645  and  grant  ECS-9873451. 
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III.  Existence  of  Principal  Curves 

The  main  idea  in  the  construction  is  to  use  the  compactness 
property  of  the  set  of  curves  of  bounded  turn  within  a  compact 
subset  of  Rd.  We  know  that  for  any  Ck,  there  exists  a  sequence 
of  curves  in  Ck  whose  distortions  converge  to  A  From  this 
sequence,  we  construct  a  subsequence  of  curves  such  that  this 
subsequence  converges  on  any  compact  subset  of  Rd.  We  then 
obtain  a  “limiting”  curve  from  this  subsequence  and  show  that 
it  achieves  the  minimum  distortion  in  the  class  and,  therefore, 
is  a  principal  curve. 

We  need  to  impose  more  stringent  regularity  conditions  on 
the  class  of  curves  we  consider  to  ensure  that  minimizers  of 
our  objective  function  exist  as  curves  that  are  permitted  to 
accumulate  their  turn  arbitrarily  far  from  the  origin  may  result 
in  the  “limit”  of  these  curves  not  being  a  curve,  but  a  union 
of  curves.  We  take  the  following  approach  to  circumvent  the 
above  problem.  Impose  a  uniform  bound  on  the  rate  at  which 
the  turn  accumulated  within  Br  converges  to  the  total  turn 
of  the  curve,  i.e.  fix  t(R)  continuous  and  decreasing  in  R  and 
consider  the  class  of  curves 

Ck  =  {/  such  that  «;(/)  <  K ,  *(/)  -  k(/|Br)  <  r(R)}  (1) 

Proposition  1  Consider  the  class  of  curves  Ck  as  detailed 
in(l).  If  E[||X||2]  <  oo,  then  there  exists  a  principal  curve 
in  CK- 

As  in  [3],  we  may  also  derive  a  result  on  learning  such  prin¬ 
cipal  curves  from  i.i.d.  data  (imposing  some  extra  regularity 
on  Fx)-  In  order  to  arrive  at  the  principal  curve,  we  resort  to 
empirical  risk  minimization.  When  we  have  a  finite  amount  of 
data,  we  cannot  optimize  over  the  entire  class  Ck  as  this  may 
lead  to  overfitting  to  the  data.  Hence,  we  choose  a  sequence  of 
classes  of  increasing  complexity  within  which  the  optimization 
is  conducted.  Just  as  in  [3],  we  consider  classes  of  polygonal 
lines  with  increasing  number  of  segments.  A  distinction  that 
we  make  is  that  we  also  expand  the  set  in  which  these  polyg¬ 
onal  lines  lie  as  the  random  variable  X  is  not  assumed  to  be 
bounded. 

Proposition  2  Suppose  that  E[||X||2lBc(X)]  <  R~a ,  then 
there  exists  an  algorithm  to  produce  a  sequence  of  estimates 
fn  such  that 

A (fn)  -  A(/*)  ~  0(tl~  6+3°.  ) 

References 

[1]  A.D.  Alexandrov,  Yu.G.  Reshetnyak,  General  Theory  of  Irreg¬ 
ular  Curves,  Mathematics  and  Its  Applications  (Soviet  Series), 
Kluwer  vol.  29,  1989. 

[2]  T.  Hastie,  W.  Stuetzle,  “Principal  Curves”,  Journal  of  the 
Amer.  Stat.  Ass.,  pp.  502-16,  1989. 

[3]  B.  Kegl,  A.  Krzyzak,  T.  Linder,  K.  Zeger,  “Learning  and  Design 
of  Principal  Curves” ,  IEEE  Trans.  PAMI,  to  appear. 


0-7803-5857-0/00/$  1  0.00  ©2000  IEEE. 


321 


IS  IT  2000,  Sorrento,  Italy,  June  25-30,2000 


Reduced-State  BCJR-type  Algorithms 

G.  Colavolpe,  G.  Ferrari  and  R.  Raheli 

Dipartimento  di  Ingegneria  dell’Informazione,  Universita  di  Parma,  Parco  Area  delle  Scienze  181/ A,  1-43100  Parma,  Italy 


Abstract  —  In  this  paper,  we  propose  a  technique 
to  reduce  the  number  of  trellis  states  in  BCJR-type 
algorithms,  i.e.,  algorithms  with  a  structure  similar 
to  that  of  the  well-known  algorithm  by  Bahl,  Cocke, 
Jelinek  and  Raviv  (BCJR).  This  work  is  inspired  by 
reduced-state  sequence  detection  (RSSD).  The  key 
idea  is  the  construction,  during  one  of  the  recursions, 
of  a  “survivor  map,”  relative  to  the  reduced-state  trel¬ 
lis,  to  be  used  in  the  other  recursion. 

I.  BCJR-type  algorithms 

We  assume  that  a  source  emits  a  sequence  of  independent 
and  identically  distributed  information  symbols  {a*}  which 
is  transmitted  through  a  channel  modeled  as  having  a  finite 
memory,  possibly  by  means  of  some  approximations  as  in  [1]. 
Denoting  by  xf  =  the  sequence  of  samples  at  the 

input  of  the  receiver,  where  K  is  the  transmission  length  and 
Xk  is  the  observation  vector  at  the  fc-th  signaling  interval,  and 
by  ek(m',m)  the  branch  which  connects  state  Sk  =  m!  to 
state  Sk+i  =  m,  we  assume  that  the  BCJR  algorithm  [2]  can 
be  generalized  as 

P(ak  =  t|xf )  =  P{ak  =  i}Y,7k(ek)ak(ek)Mek)P{S~(ek)} 

where  S~(ek)  is  the  beginning  state  of  transition  ek.  The 
sum  in  the  above  formula  is  extended  over  all  transitions  of 
epoch  k  associated  to  information  symbol  a(ek)  =  i.  Similarly 
to  the  BCJR  algorithm,  we  assume  that  we  can  compute  the 
probability  density  functions  ak(ek)  and  (3k(ek)  by  means  of 
a  forward  and  backward  recursion  [1,  2,  3]. 

II.  Principle  of  a  reduced-state  BCJR-type 

ALGORITHM 

A  single  transition  in  the  full  state  trellis  can  be  related  to  V 
information  symbols,  that  is  ek  =  (a*_v+i, •  • . , ak).  Without 
going  into  the  details  as  done  in  [4],  indicating  by  sk  a  state  in 
the  reduced-state  trellis,  we  simply  identify  the  state  reduction 
by  assuming  that  a  transition  e*  =  (sk,  sn+i)  in  the  new  trellis 
is  equivalent  to  a  sequence  (as_g+i, . . .  ,a*)  of  information 
symbols,  with  Q  <  V.  In  the  reduced-state  trellis  we  may 
define  by  e^}_j(ek)  the  sequence  of  the  most  likely  transitions 
(ek-j-i+i,...,ik-j)  =  (ak-j-i-Q+j, . . .  ,ak-j-Q+i)  along  the 
survivor  that  ends  in  ek.  As  ak(ek)  can  be  calculated  through 
a  forward  recursion  in  the  full  Btate  trellis,  a  similar  recursion 
holds  for  ak(ek)  in  the  reduced-state  trellis.  In  the  logarithmic 
domain,  we  may  write 

ock(ek)  c*  max  {V>*(€*-i,  e*)-|-dt_1(efc_1)-l-lnP{a0w(e)1_j)}} 

where  rpk(ek-i,ek)  is  a  suitable  logarithmic  probability  den¬ 
sity  function  and  a0id(ek-i)  indicates  the  information  symbol 
lost  in  the  transition  e*_i.  For  each  transition  ek,  the  tran¬ 
sition  that  maximizes  the  partial  metric  %jjk(tk- \,ck)  + 

This  work  was  supported  in  part  by  Ministero  dell’Universita  e 
della  Ricerca  Scientifica  e  Tecnologica  (MURST),  Italy. 


E/N0  tdE>] 

Fig.  1:  Application  of  the  proposed  technique  to  iterative  detection, 
through  linear  prediction,  over  a  flat-fading  channel. 

+  lnP{a0w(£(.-i)}  should  be  stored  (equivalently, 
we  could  store  )  or  a.” }“,  the  symbol  discarded  in  the 

transition  ).  Keeping  track  of  the  survivors  associated  to 
each  transition  in  the  forward  recursion,  we  build  a  “survivor 
map”  to  be  used  in  the  backward  recursion. 

The  proposed  reduced-state  technique  can  be  successfully 
applied  to  various  cases  where  iterative  decoding  can  be  em¬ 
ployed:  coherent  detection  over  channels  affected  by  intersym¬ 
bol  interference  (ISI)  (assuming  perfect  knowledge  of  the  ISI 
channel  coefficients),  noncoherent  detection  as  proposed  in  [1] 
and  fading  channels. 

In  Fig.  1,  we  consider  iterative  detection,  based  on  linear 
prediction,  over  a  Rayleigh  flat-fading  channel,  referring  to 
the  concatenated  scheme  (outer  convolutional  code  and  inner 
differential  code)  proposed  in  [5].  The  performance  for  vari¬ 
ous  levels  of  complexity  (in  terms  of  prediction  order  v  and 
reduced-state  parameter  Q  of  the  inner  differential  detector) 
is  shown.  The  considered  numbers  of  iterations  are  1  and  6  in 
all  cases.  The  performance  in  the  case  of  decoding  with  per¬ 
fect  knowledge  of  the  fading  coefficients  is  also  shown  (solid 
lines).  The  normalized  fading  rate  is  fDmamT,  =  0.01. 
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I.  Introduction 

In  this  paper  we  show  that  the  log-likelihood  of  finite  mix¬ 
ture  models  is  approximately  concave  as  a  function  of  the 
number  of  mixture  components  k.  A  corollary  of  this  result 
is  that  the  penalized  log-likelihood  will  also  be  approximately 
concave  (as  a  function  of  k)  if  the  penalty  term  is  itself  strictly 
concave  or  linear  in  k  (true,  for  example,  for  BIC  [1]).  These 
results  have  a  number  of  significant  practiced  implications  for 
parameter  estimation  [2]  and  model  selection  [3,  4]  in  a  mix¬ 
ture  context 

II.  Necessary  Conditions  on  the  Mixture 
Components 

Our  results  require  three  assumptions  on  the  functional 
form  of  the  components  in  the  finite  mixture  models  being 
used  to  fit  the  data  (assumptions  which  tire  commonly  met  in 
mixture  models  used  in  practice): 

1.  Each  model  of  complexity  k  contains  each  model  of  com¬ 
plexity  k'  <  k  as  a  special  case  (i.e.,  it  can  be  reduced 
to  a  model  of  lower  complexity  by  a  suitable  choice  of 
parameters). 

2.  Any  two  models  of  complexities  ki  and  k^  can  be  com¬ 

bined  as  a  convex  weighted  sum  in  any  proportion  to 
yield  a  valid  model  of  complexity  k  =  k\  fcj . 

3.  Each  model  of  complexity  k  —  ki  -f  k2  can  be  decom¬ 
posed  into  a  convex  weighted  sum  of  two  valid  models 
of  complexities  k  i  find  &2  respectively,  for  each  valid 
choice  of  k i  and  k?. 

III.  Concavity 

We  wish  to  fit  a  finite  mixture  model  probability  density 
function  (PDF)  to  the  data  U  consisting  of  data  points  xi, 
U  =  {a;i,a:2, . . .  ,i„},  of  the  form:  /(*;#).= 
where  are  basis  functions  of  the  mixture  model,  each  with  a 
corresponding  set  of  parameters  9j.  Assuming  that  the  Xi  are 
independent  (conditioned  on  the  model  /)  the  log-likelihood 
is  defined  as:  l(8\U)  =  ^"=1  ln/(:ri;0). 

Theorem  1: 

Assuming  a  mixture  model  that  satisfies  the  three  assumptions 
from  Section  II,  the  log-likelihood  is  first-order  concave,  i.e., 

lk+ 1  —  2 Ik  +  Ik- 1  <  0,  (1) 

within  first-order,  where  the  quantities  Ik  and  lk±i  are  log- 
likelihoods  of  the  best  k  and  k  ±  1  -component  models,  i.e., 
the  models  with  k  and  k  ±  1  components  which  achieve  the 
maximum  of  the  likelihood  function. 

'This  work  was  supported  by  NSF  CAREER  award  IRI-9703120 
and  by  the  University  of  California  MICRO  program. 


Fig.  1:  Maximum  log-likelihood  and  BIC  as  a  function  of  k  for 
Markov  mixtures  fitted  to  sequences  from  a  Web  data  set. 

From  the  theorem  above  an  obvious  corollary  is  that  if  an 
additive  penalty  term  to  the  log-likelihood  is  strictly  concave 
or  linear  in  k,  then  this  implies  first-order  concavity  of  this 
penalized  log-likelihood  (BIC  being  such  an  example). 

Figure  1  shows  an  empirical  example  of  apparent  concavity 
for  a  mixture  of  Markov  chains  fitted  to  over  100,000  page- 
request  sequences  from  a  large  commercial  Web  site.  Note 
that  BIC  as  a  function  of  k  is  unimodal,  as  predicted  within 
first-order  by  theory.  This  unimodality  is  a  useful  practiced 
property  in  searching  for  the  best  model  within  a  large  model 
family  as  in  this  example. 

Li  and  Barron  [5]  have  shown  in  related  work  that  the  log- 
likelihood  for  any  k  is  bounded  above  by  a  function  of  the 
form  C/k  where  C  is  a  constant  which  is  independent  of  k. 
The  results  presented  here  are  complementary  in  the  sense 
that  we  show  that  the  actual  maximizing  log-likelihood  itself 
is  concave  to  first-order  as  a  function  of  k. 
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Abstract  —  The  normalized  maximum  likelihood  dis¬ 
tribution  as  a  code  minimizes  the  mean  code  length 
distance  to  the  ideal  target,  defined  by  the  negative 
logarithm  of  the  maximized  likelihood  of  a  paramet¬ 
ric  class  of  models,  where  the  mean  is  taken  with  re¬ 
spect  to  the  worst  case  model  outside  the  paramet¬ 
ric  class.  The  same  minmax  bound  is  in  essence  the 
lower  bound  for  all  codes  when  the  mean  is  taken 
with  respect  to  almost  all  distributions  that  minimize 
the  mean  ideal  target.  These  results  strengthen  the 
known  bound  when  the  mean  is  restricted  to  the  para¬ 
metric  class. 


I.  Introduction 

Two  fundamental  types  of  universal  code  definmg  distribu¬ 
tions  for  a  parametric  model  class  Aik  =  {P(x"i  $)},  where  9 
ranges  over  a  subset  fl  of  the  k— dimensional  Euclidean  space, 
are  the  mixture 

P4*n)=  [  P(xn-,e)w(9)de  (l) 

Jn 


and  the  Normalized  Maximum  Likelihood  NML  distribution 


[4] 


P(*n;flV)) 

£,„-P(yn;%n))' 


(2) 


Here,  w  is  a  density  function  on  the  parameters,  often  called 
a  ‘prior’  although  no  prior  knowledge  in  the  Bayesian  sense 
is  required,  and  9(xn)  is  the  ML  estimate.  The  mixture 
for  a  special  prior  minimizes  the  worst  case  redundancy,  [2], 
min,  maxe  Ee\og{P{Xn\  9)/q(Xn)),  which  also  defines  the  ca¬ 
pacity  of  a  related  channel.  In  [5]  this  was  generalized  to  min¬ 
imizing  the  worst  case  relative  redundancy 


minmax  JE,  log(P(Xn\9g)/q(Xn)),  (3) 

<t  9 


where  9g  minimizes  —Eg  log  P(Xn]  9)  and  where  the  expecta¬ 
tion  is  to  be  taken  with  respect  to  a  distribution  outside  the 
model  class  Aik  satisfying  certain  conditions.  Asymptotically 
the  minmax  relative  redundancy  was  reached  by  a  modified 
Jeffreys’  mixture. 

The  normalized  ML  distribution,  too,  solves  a  minmax 
problem  due  to  Shtarkov,  [4],  but  of  a  very  different  kind, 


min  max  log 

,  i" 


q(xn) 


(4) 


The  first  contribution  of  this  paper  is  to  show  that  the  normal¬ 
ized  ML  distribution  also  solves  the  following  minmax  problem 


min  max  Eg  log 
9  s€G 


P(Xn\9{Xn) 

?(*")) 


logC'n(fc), 


(5) 


where  the  expectation  is  taken  with  respect  to  virtually  any 
nonsingular  distribution  p(x"). 

We  then  have  a  nice  symmetrical  situation  in  that  the 
two  universal  distributions,  the  modified  Jeffreys’  mixture 
and  the  normalized  ML  distribution,  are  solutions  to  their 
closely  related  minmax  problems,  (3)  and  (5),  respectively. 
These  are  indeed  close,  since  for  iid  models  6(x"))  — >  9g  with 
p-probability  1.  As  in  [1]  for  the  case  where  G  =  Aik  one 
can  interpret  —  Eg  log  P[Xn]  9(Xn))  as  the  mean  of  an  ideal 
but  unreachable  target  code  length,  and  the  minimizing  q  as 
the  reachable  distribution  that  is  closest  to  the  ideal  target  in 
the  mean  code  length  sense. 

In  [3]  this  result  was  strengthened  as  follows.  Let  Ge  =  {p  : 
9g  —  0},  and  define 

g(9)  =  min  Eg  log  1/ f(Xn\ 9(Xn)) 

g£Gs 

as  the  most  ‘benevolent’  distribution  for  model  f(Xn\  9)  giving 
the  shortest  mean  ideal  target. 

Theorem  1  Let  Aik  be  an  exponential  family.  Then 

\OSCn(k)=  £logf-  +  log  [  |/(0)|1/2d0  +  o(l),  (6) 

2  2n  Jn 

where  |/(<?)|  *s  the  Fisher  information.  Moreover,  for  any  dis¬ 
tribution  q(xn)  and  any  e 

Eg[e)  log  l/q(Xn)  >  Eg{e)  log  1/ f(Xn;  9(Xn))  +  ^  log  n 

except  for  9  £  An,  where  the  volume  of  An  goes  to  zero  as 
n  — l  oo. 
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Abstract  —  We  offer  two  noiseless  codes  for  rep¬ 
resenting  blocks  of  n  integers  Xn  generated  inde¬ 
pendently  by  a  source  characterized  by  an  unknown 
monotone  probability  function.  Though  assumed 
monotone,  the  source  is  allowed  arbitrary  entropy 
H  >  0,  including  zero.  Our  first  coding  procedure  is 
illustrative,  yet  universal  in  the  strong  sense  that  the 
expected  value  of  the  code  length  L(Xn )  is  dominated 
by  a  linear  function  of  the  source  entropy,  E L{Xn)  < 
Co  +  CinH.  Our  second  procedure  is  asymptotically 
optimal  in  the  sense  that  E L(Xn)  <  nH  +  o(nH).  We 
discuss  the  implications  of  these  coding  procedures 
for  model  selection  using  MDL. 

I.  Introduction 

Consider  the  problem  of  encoding  a  finite  collection  of  n 
positive  integers,  Xn  =  (Xi, . . . ,  Xn)  into  a  prefix  code  of 
shortest  expected  length.  The  component  terms  X,  >  1  are 
independent,  integer-valued  random  variables  that  share  the 
common,  unknown  monotone  probability  distribution  F.  If 
X  ~  F  denotes  a  random  variable  with  distribution  F,  then 
Pr(X  —  i)  =  pi  >  pi+ 1,  *  =  1,2,...,  but  are  otherwise  arbi¬ 
trary.  In  particular,  the  entropy 


H  =  logpi 


EL(Xn) 
max(l  ,H(Xn)) 


<  Co  +  Cl  . 


II.  Results 

Our  first  result  is  to  show  that  a  simple  modification 
of  the  concatenation  of  scalar  universal  codes  produces  a 
universal  code  with  Co  =  3  and  ci  =  § .  Surprisingly,  the  only 
modification  is  the  optional  compression  of  the  leading  bits  of 
each  universal  code  so  that  the  code  is  competitive  when  the 
source  entropy  is  near  0.  Our  second  result  is  to  extend  this 
approach  significantly  to  produce  an  asymptotically  optimal 
code  for  sources  with  arbitrary  entropy.  Specifically,  we  prove: 

Theorem  1.  There  exists  a  uniquely  decodable  prefix  code  for 
Xn  whose  length  function  L(Xn)  satisfies 


EL(Xn) 
lim  - - — 

nH-*  oo  fill 


2  log  log(nff) 
log(nR) 


1 

log  nH 


can  be  0,  in  which  case  all  Xi  —  1. 

We  wish  to  encode  Xn  as  efficiently  as  possible  for  the 
given  sample  size  n,  regardless  of  the  entropy  of  the  underly¬ 
ing  source.  Given  F,  one  can  construct  an  arithmetic  coder 
whose  code  length  LF(Xn)  is  on  average  within  one  bit  of  the 
minimum  attainable  length, 

nH  <  ELF(Xn )  <  1+nH  . 

If  F  is  unknown,  we  seek  a  universal  code  whose  loss  relative 
to  this  utopian  performance  is  limited.  In  particular,  we  seek 
to  encode  Xn  so  that  the  length  of  the  resulting  prefix  code 
L{Xn)  is  bounded  in  expectation  by  a  linear  function  of  the 
entropy  of  the  source, 

E  L(Xn)  <c0  +  c\H(Xn)  =  c0  +  ci  n  H  , 

where  the  constants  Co  and  ci  >  1  are  invariant  of  n  and  F. 
Such  a  code  is  universal  in  the  sense  described  by  Elias  [1];  the 
ratio  of  the  expected  code  length  to  the  minimum  attainable 
message  length  is  bounded  for  all  allowed  sources, 


Thus,  the  relative  redundancy  goes  to  0,  asymptotically,  as 
the  minimum  expected  number  of  bits  goes  to  infinity.  Our 
final  goal  is  to  provide  a  firm  upper  bound  on  the  code  length 
for  all  sequences.  To  this  end  we  prove: 

Theorem  2.  There  exists  a  uniquely  decodable  prefix  code  for 
Xn  whose  length  function  L(Xn)  satisfies 


E  L(Xn)  <  1  +  H(Xn)  l+O  lofl0,glQ6n 

V  loglog” 


This  code  has  a  particular  goal  in  mind:  model  selection  us¬ 
ing  the  minimum  description  length  (MDL).  In  that  setting, 
the  Xi  represent  the  absolute  value  of  rounded,  standardized 
parameter  estimates  in  a  statistical  model,  such  as  the  coeffi¬ 
cients  in  a  multiple  regression  equation. 
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We  also  want  to  use  codes  that  are  optimal  in  the  sense  of 
having  small  values  for  the  constants  Co  and  Ci . 
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Abstract  —  A  lower  bound  on  the  achievable  re¬ 
dundancy  for  universal  lossless  coding  of  parametric 
sources  with  abruptly  changing  statistics  is  derived. 
Unlike  the  previously  known  bound  for  a  problem  that 
assumes  a  fixed  number  of  changes  in  the  statistics, 
the  new  bound  is  general  and  can  be  used  even  if  the 
number  of  changes  increases  with  the  data  length. 

The  universal  lossless  coding  problem  of  Piecewise  Station¬ 
ary  Sources  (PSS’s),  namely,  sources  with  abruptly  changing 
statistics,  has  significant  practical  importance.  This  results 
from  the  fact  that  data  sequences  from  a  large  family  of  prac¬ 
tical  applications  can  be  modeled  as  being  emitted  from  a 
source  in  this  class. 

A  PSS  is  uniquely  defined  by  the  parameter  ip  =  (0,  t).  The 

vector  0  =  (0i,02, . . . ,  09)  is  the  set  of  fc-dimensional  parame¬ 
ters  that  govern  the  statistics  in  each  of  q  stationary  statisti¬ 
cally  independent  segments.  The  vector  t  =  (ti,  t2,  ■  ■  ■ ,  <<j-i) 
represents  the  set  of  transition  times  between  stationary  seg¬ 
ments.  The  redundancy  of  a  code  with  length  function  L  (•) 
for  u-sequences  governed  by  ip  is  defined  as 

Rn(L,iP)  =  -E^L(Xn)-H^,(Xn),  (1) 

n 

where  Xn  is  a  random  sequence,  is  the  expectation  for  the 
given  PSS,  and  H y,  is  the  per-letter  average  entropy  of  ip. 

In  [1],  Merhav  derived  a  lower  bound  on  the  redundancy  of 
any  universal  lossless  code  for  a  somewhat  artificial  particular 
case,  where  it  is  assumed  that  q  remains  fixed  even  if  n  grows. 
Merhav  showed  that  for  every  code  with  length  function  L  (■), 
the  average  universal  coding  redundancy  over  all  sequences  of 
n  letters,  drawn  from  almost  every  PSS  ip  with  a  fixed  number 
of  stationary  segments  q,  is  lower  bounded  by 

Rn(L,iP)>(l-e)(±kq  +  q-l)  (2) 

where  e  >  0  can  be  arbitrarily  small. 

In  various  recent  works,  different  approaches  were  used  to 
develop  low  complexity,  strongly  sequential,  compression  al¬ 
gorithms  specifically  designed  to  code  memoryless  PSS’s.  Re¬ 
cently  (see  [3]),  it  was  shown  that  even  if  q  grows,  there  exist 
such  coding  schemes  that  achieve  redundancy  of 

Rn(L,iP)<(l  +  e)(±kq  +  q-iy-^  (3) 

for  every  PSS,  where  m  =  n/q  is  the  average  segment  length 
and  e  >  0  can  be  arbitrarily  small. 

In  this  work  we  show  that  in  the  general  case,  where  q  is 
allowed  to  grow  with  n  (but  at  a  slower  rate),  there  exists  a 
lower  bound  that  asymptotically  meets  the  upper  bound  in 
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(3) .  First,  let  Aq  be  the  class  of  all  PSS’s  with  q  segments. 
Then,  define  a  subclass  At  C  A,  a s  follows:  If  q  — 1  oo,  Ae  con¬ 
tains  all  ip  £  Aq  for  which  almost  all  segments  are  sufficiently 
long  (at  least  m1_t  time  units)  and  almost  all  transitions  are 
sufficiently  large  (at  least  of  Euclidean  distance  of  m~e).  Oth¬ 
erwise,  Ae  contains  all  ip  €  A,  for  which  all  segments  are  suf¬ 
ficiently  long  and  all  transitions  sufficiently  large.  It  can  be 
shown  that  in  either  case,  Ae  contains  almost  all  sources  in  A, 
in  the  sense  that  under  the  uniform  prior  (distribution)  pt  (•) 
over  all  possible  sources  in  Aq,  p  (A„)  -4  1  as  n  -4  oo. 

Next,  partition  the  subclass  AE  into  disjoint  sets  tp  = 
(ip1,.  ■ .  ,ipMv),  each  with  Mv  >  M  points  ip'  £  Ac,  such  that 
any  set  tp  contains  the  largest  possible  number  of  sources  ip, 
distinguishable  by  Xn ,  for  which  the  parameters  for  all  short 
segments  and  small  transitions  are  identical.  A  set  tp  is  distin¬ 
guishable  by  Xn  if  for  any  source  ip'  £  tp,  the  probability  that 
an  Xn  generated  by  ip'  appears  to  be  generated  by  ip1  £  ip 
for  j  /  i  goes  to  zero. 

By  the  random  coding  version  of  the  redundancy- capacity 
theorem  (see  [2]),  if  /r(Ac)  -4  1,  and  all  possible  sets  tp  are 
distinguishable  by  Xn,  then  the  redundancy  of  every  code  for 
almost  every  source  ip  6  A, ,  except  for  a  set  of  sources  B  for 
which  p  (B)  — >  0,  is  lower  bounded  by 

Rn(L,ip)>(l-E)'-^,  (4) 

n 

where  e  >  0  can  be  arbitrarily  small.  Lower  bounding  the 
maximum  that  satisfies  the  above  condition,  and  using 

(4) ,  the  redundancy  for  almost  all  ip  £  A,  is  lower  bounded 

by 

Rn(L,iP)>(l-e)(^kq  +  q-l)  ^  (5) 

for  any  parametric  PSS  of  practical  interest.  If  q  m,  in 
order  for  all  sources  ip  in  a  set  tp  to  be  distinguishable,  the 
choices  of  5q  long  segments,  and  Sq  large  transitions,  for  a 
S  >  0  that  can  be  arbitrarily  small,  are  constrained  by  the 
choices  of  the  other  parameters  for  any  source  ip  £  tp.  This 
reduces  Mv,  but  negligibly,  resulting  in  the  same  lower  bound 
as  in  (5).  The  lower  bound  above  confirms  the  optimality  of 
schemes  that  achieve  the  redundancy  in  (3). 
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Abstract  —  We  present  bounds  of  the  redundancy  II.  BLOCK-SORTING  ALGORITHM 

of  the  recency-rank  [2]  and  the  block-sorting  [1]  uni-  Let  ^  be  the  Burrows-Wheeler  Transform  (BWT)  of  xqk 
versal  lossless  data  compression  algorithms  for  finite-  with  *._symbol  extension  and  a k{xqk)  be  the  pointer  of  the  se- 
length  sequences.  quence.  We  use  the  recency-rank  algorithm  tpk  with  fc-symbol 

I  RECENCY-RANK  Algorithm  extension  to  encode  o(xqk)  and  a  lossless  encoder  Sq  to  encode 

We  call  the  mapping  £  :  An  -4  An  k-block  permutation  if  there  a(xqk).  Then  the  block-sorting  algorithm  4?j.  is  defined  by 


exists  a  permutation  *■„»  :  {1,2,...  ,q}  -4  {1,2,...  ,q}  such  .  ,  n 

,  U)k  In  ,  *k[x  )  =  'pk{<Tk{x'  )*xqk+1)*6q{ah{x  )). 

that  £(x  )  =  |*>=a  V«»(J)-i]*4iJ  *  xqk+u  where  q  =  [ n/k\ 

and  n  =  qk  +  r.  We  assume  that  both  the  encoder  and  Since  BWT  is  a  fc-block  permutation,  we  do  not  need  the 

the  decoder  have  £.  For  yn  €  An,  we  encode  yqk  by  the  BWT  to  perform  the  universal  asymptotic  optimality  when 

recency-rank  algorithm  with  fc-symbol  extension  and  yqk+i  the  lossless  encoder  is  the  recency-rank  algorithm  with  the 

using  some  fixed-length  lossless  code.  Let  pk  be  the  encoder  extension  of  alphabet.  However,  due  to  sorting  in  the  lex- 

and  H(pk{xn))  be  the  non-overlapping  fc-block  empirical  en-  icographical  order,  symbols  with  the  same  context  are  gath- 

tropy  function  estimated  from  xn .  .  ered  by  the  BWT.  This  provides  the  good  performance  for  the 

„  r  ,  ,  ,  ,  , ,  ,  ...  ™  block-sorting  algorithms.  We  construct  the  code  <£&  defined 

Theorem!  Let  £  be  a  k-block  permutation.  I  hen  ^ 

<  \H{Pk{xn))  +  i  log2 k  $k(xn)  =  w((r*(TV‘)  *  <fc+1)  *  5q(ak(T*xqk))  *  6k(s), 

1.  L  2fc|A|*’1  1.  .  2fc|A|<:  where  s  =  argmin0<s<jt-i  t(<pk((rk(Tsxqk))  and  T  is  the  rota- 

fc  °g2  [  n  J  k  °ga  °g2  n  tion  of  the  sequence.  Let  H(pm(xn))  be  the  empirical  m-step 

,,  ,  ,  x  / .  .  Markov  entropy  function  estimated  from  xn .  We  have  the 

+  o  (l062lfc0g2M  +  o  UJ  .  following  theorem. 

tj n. —  / — \  i_ /■ — m  a ifcfnl  ^  ..  i  in  Theorem  3 


tropy  function  estimated  from  xn . 

Theorem  1  Let  £  be  a  k-block  permutation.  Then 

<  \H{jpk(xn))  +  ilog2fc 


1  ,  T  2k\A\k 
+  rl°g2  1  +  -^rL 

K  Tl 


, 1 ,  ,  r,  ,  2fc|Aifc 

+  -log2log2  1  +  -^- 


o(!M)+o(t). 


When  fc(n)  satisfies  fc(n)|A|fc^  <  n  <  [fc(n)  +  l]|A|^n')+1, 

.  ^  I  -l  [l°g2  Wjggg  log2^ 


+  0  /log2]og2_log2n'\ 

V  log2  n  J  ‘ 


When  £  is  the  identity  map,  this  theorem  is  the  results 
in  [2],  It  follows  from  the  theorem  that  the  algorithm  is  asymp¬ 
totically  optimal  for  an  infinite-length  sequences,  stationary 
ergodic  sources  in  the  almost-sure  sense,  and  Asymptotically 
Mean  Stationary  (AMS)  sources  in  the  average  and  almost- 
sure  sense. 

Next  theorem  tells  us  the  lower  bound  of  the  redundancy. 

Theorem  2  Let  £  be  a  bijective  k-block  permutation.  Then 
there  exist  0  <  h  <  log2  |«4|  and  x "  such  that 

£/(*fctt(*n)))  >  \H{jpk{xn))  +  \  log2  fc 


U(*k(xn))  <  H(r(xqk))  +  i  log2  fc 

1,  r  2fc|Ar|AN  ,1,  ,  r  ,  2fc|^r|AN 

+  fc  l0g2 11  + - n  J  +  fc  l0g2  l0g2 11  +  n  J 

+  o  (log>g2*)  +  o  +  o  (£)  . 

When  fc(n)  satisfies  fc(u)|A|fe(n)  <  n  <  (fc(n)  4-  l]|-4|fc(n)+1, 


(*"))  <  min  H{r{xq 

71  0  <m<k(n)  1 


m[log2  \A\f 


[l°g2  |-4.|]  Iog2  log2  n 


Bog2log2  log  2 


When  k(n)  satisfies  fc(n)|A|*^n'  <  n  <  [fc(n)  +  l]|.A|fc(n) 


;%k(^n))) 

n 


>  j;H(Pk(xn))  + 


[Iog2  I-4Q  log2  lpg2  71 


(log2n)  ' 


It  should  be  noted  here  that  m  is  automatically  optimized 
by  the  BWT  and  we  don’t  need  to  select  it. 
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Abstract  —  Let  a  be  a  primitive  element  of  F3 » .  Let 
d  =  32fc  —  3fc  +  l  where  n  =  3 k.  We  show  that  the  ternary 
sequence  {s(t)}  given  by  s(t)  =  Trn{at  +  adt)  has  a  two- 
level  ideal  autocorrelation  function. 

I.  Introduction 

Given  a  sequence  (s(f)}  of  period  e  and  with  elements  from 
a  finite  field  Fp.  The  autocorrelation  of  the  sequence  at  shift 
r  is  defined  by 

a(r)  =  JV(t+r)-,(t) 

t= 0 

where  u  is  a  complex  pth  root  of  unity. 

An  important  problem  in  sequence  design  is  to  find  se¬ 
quences  with  two-level  ideal  autocorrelation,  i.e.,  where  a(r)  = 
—  1  for  any  r^0.  Recently,  much  progress  has  been  obtained 
for  binary  sequences  of  period  e  =  2n  —  1.  These  are  of  consid¬ 
erable  interest  also  because  of  their  close  connections  to  differ¬ 
ence  sets.  For  recent  work  on  binary  sequences  with  two-level 
ideal  autocorrelation  function  the  reader  is  referred  to  [3],  [6], 
[7],  [8],  [2]  and  [1], 

Motivated  by  these  results  we  focused  our  attention  to 
ternary  sequences.  To  our  knowledge,  the  only  non-binary 
sequences  over  the  alphabet  Fp  of  length  pn  —  1  with  ideal  au¬ 
tocorrelation  are  the  m-sequences  and  the  GMW-sequences. 
This  paper  present  one  new  family  of  ternary  sequences  with 
ideal  autocorrelation. 

II.  Main  result 

It  is  known  that  the  crosscorrelation  function  takes  on  three 
values  in  the  case  d  =  p2k  —  pk  +  1  when  n/gcd(n ,  k)  is  odd. 
When  p  =  2  this  result  is  usually  attributed  to  Welch  even 
though  he  never  published  a  proof.  In  Kasami  [5]  a  proof  can 
be  found  in  the  binary  case.  For  p  >  2  the  proof  is  given  in 
Trachtenberg  [11], 

The  following  is  the  main  result. 

Theorem  1.  Let  d  =  32fc  —  3*  +  1,  n  =  3k  and  let  {s(t)}  be 
the  ternary  sequence  given  by 

s(t)  =  Trn(at  +  adt) 

where  a  is  a  primitive  element  of  F^n .  Then  the  sequence 
(s(t)}  has  ideal  two-level  autocorrelation. 

The  autocorrelation  of  the  sequence  above  will  equal  the 
crosscorrelation  of  two  m-sequences  that  differ  by  this  deci¬ 
mation  and  thus  will  be  at  most  three-valued.  The  purpose 

'This  work  was  supported  in  part  by  The  Norwegian  Research 
Council  under  grant  numbers  127203/410  and  119390/431  and 
in  part  by  the  National  Science  Foundation  under  Grant  NCR- 
9612864. 


of  this  paper  is  to  show  that  the  autocorrelation  of  this  se¬ 
quence  has  only  one  out-of-phase  value,  being  equal  to  —  1. 
By  observing  that  the  trace  function  of  the  sequence  can  be 
expressed  as  a  quadratic  form  over  Fpk ,  we  found  the  number 
of  solutions  and  thereby  proved  the  theorem. 

For  further  references  on  the  crosscorrelation  of  re¬ 
sequences  the  reader  is  referred  to  [4],  [10],  [5],  [11]  and  [9]. 

References 

[1]  J.  Dillon,  “Multiplicative  difference  sets  via  additive  charac¬ 
ters  ”,  Designs,  Codes  and  Crytopgrahy,  vol.  17,  pp.  225-235, 
1999. 

[2]  J.  Dillon  and  H.  Dobbertin,  “New  cyclic  difference  sets  with 
Singer  parameters”,  submitted  for  publication. 

[3]  R.  Evans,  H.  Hollmann,  C.  Krattenthaler  and  Q.  Xiang,  “Gauss 
sums,  Jacobi  sums,  and  p-ranks  of  cyclic  difference  sets”,  Jour¬ 
nal  of  Combin.  Theory  Ser.  A.,  to  appear. 

[4]  T.  Helleseth,  “Some  results  about  the  cross-correlation  function 
between  two  maximal  linear  sequences”,  Discrete  Math.,  vol.  16, 
pp.  209-232,  1976. 

[5]  T.  Kasami,  “The  weight  enumerators  for  several  classes  of  sub¬ 
codes  of  the  2nd  order  Reed-Muller  codes”,  Information  and 
Control,  vol.  18,  pp.  369-394,  1971. 

[6]  A.  Maschietti,  “Difference  sets  and  hyperovals”,  Designs,  Codes 
and  Cryptography,  vol.  14,  pp.  89-98,  1998. 

[7]  J.  S.  No,  H.  Chung  and  M.  S.  Yun,  “Binary  pseudorandom 
sequences  of  period  2m  —  1  with  ideal  autocorrelation  generated 
by  the  polynomial  zd  +  (2+  l)d”,  IEEE  Trans.  Inform.  Theory, 
vol.  44,  pp.  1278-1282,  1998. 

[8]  J.  S.  No,  S.  W.  Golomb,  G.  Gong,  H.  K.  Lee  and  P.  Gaal, 
“Binary  pseudorandom  sequences  of  period  2n  -  1  with  ideal 
autocorrelation”,  IEEE  TYans.  Inform.  Theory,  vol.  44,  pp.  814- 
817,  1998. 

[9]  Y.  Niho,  “Multi-valued  cross-correlation  functions  between  two 
maximal  linear  recursive  sequences”,  Ph.D.  Thesis,  University 
of  Southern  California,  Los  Angeles,  USA,  1972. 

[10]  D.  V.  Sarwate  and  M.  B.  Pursley,  “Crosscorrelation  properties 
of  pseudorandom  and  related  sequences” ,  Proc.  IEEE  Interna¬ 
tional  Symposium  Inform.  Theory  vol.  68,  pp.  593-619,  1980. 

[11]  H.  M.  Trachtenberg,  “On  the  cross-correlation  functions  of 
maximal  linear  sequences”,  Ph.  D.  Thesis,  University  of  South¬ 
ern  California,  Los  Angeles,  USA,  1970. 


0-7803-5857-0/00/$!  0.00  ©2000  IEEE. 


328 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Ternary  m-Sequences  with  Three- Valued  Crosscorrelation  Function: 

Two  New  Decimations 


Hans  Dobbertin 
German  Information 
Security  Agency 
P.O.  Box  20  0363 
D-53133  Bonn,  Geirmany 
e-mail: 

dobbertinflskom . rhe in . de 


Tor  Helleseth1 
Dept,  of  Informatics 
University  of  Bergen 
H0yteknologisenteret 
N-5020  Bergen,  Norway 
e-mail:  torhflii.uib.no 


Vijay  Kumar 
Com.  Science  Institute 
El.  Engineering-Systems 
USC,  Los  Angeles 
CA  90089-2565,  USA 
e-mail:  vijaykflusc.edu 


Halvard  M.  Martinsen1 
Dept,  of  Informatics 
University  of  Bergen 
H0yteknologisenteret 
N-5020  Bergen,  Norway 
e-mail:  halvardflii.uib.no 


Abstract  —  We  show  that  the  crosscorrelation  be¬ 
tween  two  ternary  m-sequences  of  period  3"  —  1  that 
differ  by  the  decimation  d  =  2-3m  +  1,  where  n  =  2m  +  l, 
takes  on  3  different  values.  We  conjecture  the  same 
result  for  the  decimation  d—  2  •  3r  +  1,  where  n  is  odd 
and  r  is  defined  by  the  condition  4r  +  1  =  0  mod  n. 
These  two  new  cases  form  in  a  sense  ternary  coun¬ 
terparts  of  two  recently  confirmed  binary  cases,  the 
conjectures  of  Welch  and  Niho. 

I.  Introduction 

Let  (u(<)}  and  {v(t)}  be  two  sequences  of  period  e  with  sym¬ 
bols  from  Fp ,  the  finite  field  of  p  elements.  The  crosscorrela¬ 
tion  of  the  sequence  («(t)}  and  {«(£)}  is  defined  as 

eu,v(r)  =  (‘+U-*(0, 

t=0 

where  u  is  a  complex,  primitive  pth  root  of  unity. 

If  {«(£)}  and  (v(t)}  are  two  cyclically  distinct  m-sequences 
of  period  pn  —  1  with  symbols  from  Fp,  we  may  assume  without 
loss  of  generality  that  there  exists  a  d  such  that  gcd(d,  pn  — 
1)  =  1  and  that 

u(t)  =  Trn(at)  and  v(t)  =  Trn(adt) 

for  some  primitive  elements  a  in  the  finite  field  Fp™  and  where 
Trn  denotes  the  trace  function  from  Fp»  to  Fp.  We  use 
Cd(r)  to  denote  the  crosscorrelation  function  between  the  m- 
sequence  (s(t)}  and  its  decimation  {s(df)}. 

It  is  known  for  a  long  time  that  the  crosscorrelations  be¬ 
tween  two  Fp-valued  m-sequences  of  period  pn  -  1  that  differ 
by  a  decimation  d  takes  on  3  different  values  for  the  Gold  type 
decimation  d  =  |  (pk  + 1)  (which  can  be  replaced  by  d  =  2k  + 1 
for  p  =  2)  and  the  K  as  ami-  Welch- Trachtenberg  type  decima¬ 
tion  d  —  p2k  —  pk  +  1  if  n/  gcd(fc,  n)  is  odd.  We  even  get  a 
preferred,  i.e.  three- valued  and  minimal,  crosscorrelation  func¬ 
tion  in  these  cases  if  gcd(fc,  n)  =  1. 

In  the  binary  case  old  conjectures  of  Welch  [5]  and  Niho 
[7]  have  recently  been  confirmed  in  part  by  [1],  [4],  [3]  and 
[6],  which  give  two  additional  decimations  with  a  preferred 
crosscorrelation  function  for  each  odd  n.  Apart  from  the  above 
mentioned  cases  no  other  decimations  for  odd  n  are  known 
to  have  a  preferred  crosscorrelation  function.  Particular,  no 
decimation  have  been  found  by  computer  experiments.  Two 
further  cases  for  even  n  can  be  found  in  [2], 

1This  work  Was  supported  in  part  by  The  Norwegian  Research 
Council  under  grant  numbers  127203/410  and  119390/431. 


II.  Main  results 

Our  main  result  is  the  following  theorem,  loosely  speaking 
the  “ternary  Welch  conjecture” : 

Theorem  A.  Let  d  =  2  •  3m  +  1,  where  n  =  2m  +  1,  then 
the  crosscorrelation  function  Cd(r)  is  preferred,  i.e.  it  takes 
on  the  following  three  values: 

— 1  +  3m+1  occurs  i(3n-1+3m)  times 

-1  occurs  3"  -  3n_1  - 1  times 

— 1  —  3m+1  occurs  i(3n-1-3m)  times. 

Our  proof  of  Theorem  A  follows  in  principle  the  same  basic 
steps  as  the  proof  of  the  binary  case.  We  were  not  able  to 
prove  the  following  “ternary  Niho  conjecture” . 

Conjecture  B.  Let  d  =  2  ■  3r  -I- 1,  where  n  =  2m  +  1  and 

2=^  if  n  =  1  (mod  4), 

ifn  =  3  (mod  4), 

then  the  crosscorrelation  function  Cd(r)  is  preferred. 
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Abstract  —  We  introduce  three  definitions  of  quater¬ 
nary  codes  which  are  based  on  a  biologically  motivat¬ 
ed  measure  of  sequence  similarity  for  quaternary  n- 
sequences,  extending  Hamming  similarity.  The  corre¬ 
sponding  codes  are  used  in  bio-molecular  experiments 
with  DNA  sequences.  One  of  the  codes  is  based  on  a 
distance  function,  extending  Hamming  distance.  We 
discuss  upper  and  lower  bounds  on  the  rates  of  these 
codes. 

I.  Notations 

Consider  the  quaternary  alphabet  A  =  {0, 1,2,3}  and  de¬ 
note  by  An  the  set  of  words  w  =  (toi, . . .  ,wn),  Wi  €  A.  For 
any  two  words  x,  y  €  An,  we  define  a  similarity  function 

S(x,y )  =  J2<;(x„yi),  (1) 

t=i 

where  alphabetic  similarities  c(0,0)  =  c(3,3)  =  3,  c(l,l)  = 
c(2,2)  =  2  and  c(x,y)  =  0  for  x  ft  y. 

The  value  5(x,x)  is  called  a  self- similarity  of  the  word  x, 
and  the  value  S(x,y)  for  x  ^  y  is  called  a  cross-similarity 
between  sequences  x  and  y. 

Using  (1),  we  define  a  DNA  distance  on  A"  x  An: 

P(»,y)*g(x’lt)^(y.y)-a(»,y).  (2) 

For  any  x  =  (x\, . . .  ,xn)  £  A",  we  introduce  its  reverse 
complementary  word 

X  —  (Xn,Xn  —  1,  *  ■  •  ,Xi), 

where  alphabetic  complementaries  have  the  form  0  =  3,  1  =  2 , 
2  =  1  and  3  =  0. 

II.  Definitions 

Definition  1.  A  set  of  words  C  C  A"  is  called  a  reverse- 
complement  code  of  DNA  distance  D  if  the  following  two  con¬ 
ditions  hold: 

1)  for  any  word  x  £  C,  its  reverse  complementary  word 
x^x  and  x  E  C; 

2)  D(x,  y)  >  D  for  any  x  ^  y,  x,  y  €  C. 

Definition  2.  A  set  of  words  C  C  A"  is  called  a  reverse- 
complement  code  with  similarity  parameters  ( Si,S2 )  if  the 
reverse-complement  condition  1)  holds  and 

'This  paper  was  supported  by  the  US  Department  of  Energy 
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98-01-00241. 
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2')  5(x,x)  >  Si  for  any  x  €  C  and  «S(x,  y)  <  S2  for  any 
x  yt  y,  x  6  C,  y  g  C. 

Definition  3.  A  set  of  words  C  C  A"  is  called  a  reverse- 
complement  code  with  similarity  threshold  A  if  the  reverse- 
complement  condition  1)  holds  and 

2")  Si  —  52  >  A,  where  Si  is  the  least  self-similarity  and 
S2  is  the  largest  cross-similarity  in  the  set  C,  see  2'). 

III.  Bounds  on  the  Rate 

Denote  by  t(n,D),  t'(n,Si,S2)  and  t"(n,  A)  the  maximum 

possible  sizes  of  codes  defined  above.  Let  n  — >  00,  D  ~  nd, 

Si  ~  ns  1,  S2  ~  ns2  and  A  ~  nS,  where  d,  s  1,  S2  and  S  are 

fixed.  Introduce  the  rates  of  these  codes 

a  log 2t{n,D) 

R(d)  -  limsup  — —  - -, 

n—¥  00  R 

ryf  /  \  A  1 '  ^62  ^  (R>  *^1  1  $2  ) 

R  (si,  S2)  =  limsup  — ^ 

n-+oo  R 

D"rr\  a  i-  log,f"(n,A) 

n— f  00  R 

Theorem  1  (Plotkin  bound).  If  d  >  1.9,  then  R(d)  =  0. 
■I/O  <  d  <  1.9,  then 

R(d)  <  R(d)  ±  2  (l  -  . 

Let  m(p)  =  1+6 p-  10p2  <  m(3/10)  =  1.9,  0  <  p  <  1/2, 

p(h,p)  ±  log2  (2p2(l  +  23h)  +  2q2(l  +  22h)  +  8pq22  Sh')  , 

where  q  =  \  —  p.  For  the  fixed  p  €  (0,1/2),  consider  the 
function  E(p,d)  >  0,  0  <  d  <  m(p ),  defined  by  the  parametric 
equations 

E(p,d)  =  h^Pl-p(h,p),  h<  0. 

Theorem  2  (random  coding  bound).  If  0  <  d  <  1.9,  then 
R{d)  >  R{d)  =  max  E(p,  d)  >  0, 

m(p)>d 

where  the  maximum  is  taken  over  p,  0<p<l/2,  for  which 
0  <  d  <  m(p). 
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Abstract  —  Orthogonal  frequency  division  multi¬ 
plexing  (OFDM)  is  a  common  technique  in  multicar¬ 
rier  communications.  Whereas  most  results  so  far 
relate  4-PSK-based  OFDM,  this  contribution  intro¬ 
duces  a  new  approach  using  the  16-QAM  scheme. 

I.  Basics 

The  transmission  of  a  signal  in  an  OFDM  system  is  based 
on  equally  spaced,  phase-shifted  sinusoidal  carriers.  In  a  4- 
PSK  modulation  e.g.  there  are  four  distinct  phase  shifts  used, 
and  the  OFDM  signal  of  a  word  x  €  ZJ  is  given  by  the  real 
part  of  the  function: 

n— 1 

Sx(t)  :=  exp[27ri(/o  +  jAf)t] 

3=0 

where  fo  +  j A/  are  the  carrier  frequencies  and  i  =  y/— T.  The 
16-QAM  (quadrature  amplitude  modulation)  is  a  signal  set 
that  allows  a  convenient  representation  as  a  product  af  two 
4-PSK  modulations.  Here,  after  a  normalization  R  =  y/2  is 


16-QAM  as  a  product  of  two  4-PSK 


the  distance  between  the  origin  and  the  center  of  one  4-PSK 
circle  and  r  =  |\/2  is  the  radius  of  the  smaller  circles.  Then 
the  OFDM  signal  is  given  by  the  function 

n  — 1 

Sx,y(t)  ~  ]P(RiTi  +  WWj')exp[27ri(/o  +  jAf)t], 

3=0 


where  x,y  €  ZJ.  The  instantaneous  envelope  power  is  defined 
as  Px,y(t)  :=  |5I,y(t)|2,  and  the  peak-to-mean  envelope  power 
ratio  (PMEPR)  of  a  set  Z  C  Z4  x  Z"  is  given  by 


PMEPR(Z)  := 


sUP(B,v)egSUptPx,y(t) 
TFf  S(x,5/)e Z I  px,y{t)  dt 


1This  work  was  supported  by  AT&T  Labs-Research,  Florham 
Park,  NJ 


II.  Results 


The  aperiodic  autocorrelation  of  a  sequence  x  €  Z”  at  dis¬ 
placement  u  is  the  function 

Cx(u):=  Y1 

0<j,j+u<n-l 

Two  sequences  x,y  6  ZJ  form  a  Golay  complementary  pair 
(GCP)  if  Cx{u)  +  Cy(u)  =  0  for  each  u  ±  0.  A  member  of  a 
GCP  is  called  a  Golay  sequence.  It  is  known  that  Px{t)  <  2n 
for  a  Golay  sequence  x  €  Z4. 

Theorem:  For  a  GCP  ( x,y )  of  length  n  there  holds 
Px,y{t)  <  5n.  If  x  and  y  are  Golay  sequences  (not  necessarily 
forming  a  GCP)  then  Px,y(t)  <  lOn. 

If  C  C  Z4  is  invariant  under  the  translation  by  the  all-2- 
sequence,  then  for  Z  :=  C  x  C  we  have 


w 


£  jf 


Px,y{t)dt  =  2.5  n. 


Theorem:  Let  C  C  ZJ  be  a  set  of  Golay  sequences  that 
is  invariant  under  the  translation  by  the  all-2-sequence.  Then 
the  PMEPR  for  Z  :=  C  x  C  is  bounded  by  4. 
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Abstract  —  Linear  weighted  multistage  parallel  in¬ 
terference  cancellation  (PIC)  implements  exactly  the 
family  of  polynomial  expansion  detectors.  For  long- 
code  CDMA,  a  set  of  optimal  weights  is  found  which 
minimizes  the  ensemble  averaged  mean  squared  error 
(MSE)  over  random  codes.  The  weights  are  depen¬ 
dent  on  moments  of  the  eigenvalues  of  the  correlation 
matrix,  where  exact  expressions  are  derived.  The  loss 
incurred  by  averaging  rather  than  using  the  optimal, 
time-varying  weights  is  practically  negligible. 


I.  Introduction 


Consider  a  AT-user  symbol-synchronous  CDMA  system 
with  processing  gain  N.  The  received  signal  vector  is  r  = 
Ad  +  n,  where  A  =  (ai ,  a^ ,  •  •  • ,  ajr)  is  the  matrix  contain¬ 
ing  all  users’  spreading  codes,  d  =  (<£i,  d2,  ■  ■  ■ ,  dK)T  the  data 
vector  and  n  the  AWGN  with  variance  <r2. 

The  detailed  structure  of  the  ith  PIC  stage  with  weight 
Hi  is  depicted  here  where  MF  denotes  matched  filtering.  A 
multistage  PIC  is  a  simple  cascade  of  m  PIC  stages,  whose 
output  is 


y  m  = 


i-nor-^(R+<72i)) 

t=L 


(R  +  c2l)  1  AHr 


(1) 


where  R  =  AHA  is  the  correlation  matrix.  By  choosing  an 
appropriate  set  of  weights,  the  PIC  can  implement  exactly  any 
detector  of  the  form  of  a  polynomial  in  R  applied  to  the  code 
matched-filtered  output  AHr. 

It  has  previously  been  shown  that  the  PIC  is  a  realiza¬ 
tion  of  the  steepest  descent  algorithm  used  to  minimize  the 
MSE.  Following  this  interpretation,  a  unique  set  of  weights, 
dependent  on  the  eigenvalues  of  R,  was  found  to  lead  to  the 
minimum  achievable  MSE  for  a  given  number  of  stages  in 
a  short-code  system  [1],  This  approach  is  too  complex  for 
long-code  systems.  Instead,  we  consider  using  a  set  of  code¬ 
invariant  weights  designed  to  give  the  minimum  ensemble  av¬ 
eraged  MSE  over  random  codes. 

The  ensemble  average  of  the  excess  MSE,  as  compared  to 
the  MMSE,  is  expressed  as  a  function  of  u  =  (hi ,  Hi,  ■  ■  • ,  Hm) 


J(m)(  u)  =  E 


A  k  +  cr7 


JJ  |l  -  /u(A*  -I-  cr2)|2 


(2) 


function  in  a  vector  x,  which  is  a  function  of  u.  A  unique 
minimum  is  then  obtained  where  the  corresponding  Hk  s  are 
found  as  the  inverse  polynomial  transform. 

For  an  m-stage  PIC,  the  weights  depend  on  the  first  2m 
moments  of  the  eigenvalues,  defined  as  Mr  =  E{Ar},  r  = 

1,  2,  ■  ■  • ,  2 to  where  A  is  an  arbitrary  eigenvalue  of  R.  Moreover, 

Mr  =  -^E  {trace  {Rr}} 

K 

l  K  K  K 

=  "  '  X/  E  {-E* lk3Rk2k3  ■  ■  ■  Rkr-ikrRkrk!  } 

fc1=l  fe3=l  kT  =  l 

K  K  K  N  N  N 

-  iEE-E  EE-E 

fci  =  lfc2  =  l  fcr  =  l  ni=l  nj  =  l  nr  =  l 

E  "{i4ni  ki  ^ni  fc3  -An3  fc3  ^n3fc3  *  *  •  Anrkr  Anrki  }  •  (3) 

Since  Ank  are  all  independent  random  variables,  only  terms 
containing  all  complex  conjugate  pairs  are  relevant.  Mr  is 
then  obtained  through  evaluation  of  the  summation  over  all 
combinations  of  indices.  As  the  expectation  is  taken  over  all 
code-sets,  Mr  only  depends  on  N  and  AT,  but  not  on  specific 
codes.  In  fact  it  is  a  polynomial  in  N  and  K  [2]. 

With  the  exact  expressions  derived,  the  moments  can  be 
evaluated  easily  and  the  optimal  weights  computed.  The 
computational  complexity  is  minor  for  a  moderate  number 
of  stages  and  hence  can  be  implemented  on-line.  Simulation 
results  show  that  the  penalty  of  averaging  rather  than  using 
the  optimal  weights  dependent  on  the  instantaneous  spreading 
codes  is  negligible  in  most  cases  of  interest. 


where  the  Afc’s  are  the  eigenvalues  of  R  and  the  expectation 
is  taken  over  random  codes.  By  an  elementary  symmetric 
polynomial  transform  in  u  we  can  rewrite  (2)  as  a  quadratic 
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Abstract  —  Recently,  a  closed  solution  to  the  capac¬ 
ity  of.  synchronous  CDMA  for  fading  channels  sup¬ 
posing  exact  channel  knowledge  at  the  receiver  and 
infinite  spreading  factor,  i.e.,  N—>oo  has  been  derived 
by  Shamai  and  Verdu  [1],  On  the  other  hand,  consid¬ 
ering  imperfect  channel  state  information  Evans  and 
Tse  [2]  derived  analytical  solutions  to  the  signal  to 
noise  ratio  provided  by  linear  multiuser  receivers  and 
could  also  give  results  for  the  error  variance  resulting 
from  linear  channel  estimation.  Here,  lower  and  up¬ 
per  bounds  on  the  users’  SIR  reachable  by  means  of  a 
nonlinear  MMSE— receiver  applying  successive  cancel¬ 
lation  while  assuming  a  certain  channel  estimation  ac¬ 
curacy  are  given.  Note  that  due  to  imperfect  channel 
estimation  the  interference  from  previously  decoded 
users  cannot  be  cancelled  completely  even  if  no  de¬ 
coding  errors  occur. 

I.  Lower  and  Upper  Bound  on  SIR 

We  consider  the  synchronous  transmission  of  K  users  over 
frequency  selective  fading  channels  to  a  common  receiver  and 
model  the  shifted  replicas  of  the  fcth  user’s  random  spreading 
sequence  arriving  over  the  L  resolved  paths  as  L  independently 
chosen  random  spreading  sequences  sfcil[/i], . .  .  ,  sfciI,[p],  VJfc, 
with  spreading  factor  N  [2]  (Note,  this  model  holds  exactly  for 
the  equivalent  case  of  K  users  transmitting  with  L  antennas 
over  flat  fading  channels  to  a  single  receiver).  Assuming  im¬ 
perfect  channel  knowledge  at  receiver  site,  the  path  weights 
are  modeled  as  sum  of  a  Gaussian  distributed  MMSE  path 
weight  estimate  hkj[y]  with  variance  1  /L  —  J  and  orthogonal 
estimation  error  nkj[y]  with  power  J.  For  sake  of  simplicity, 
we  suppose  that  channel  estimation  is  performed  with  equal 
accuracy  for  all  users  1  <  k  <  K,  paths  1  <  l  <  L  and 
time  slots.  Now,  considering  user  k  the  signal  employed  for 
subsequent  processing  in  a  specific  time  interval  [p]  after  can¬ 
cellation  of  user  k  +  1, . . . ,  K  based  on  their  perfectly  decoded 
symbols  is 

k  L 

yAtA  =  EE--  +  nKji[fi])xK[n] 

K  =  1  1=1 

K  L 

,![+]«*,/ +  n[/x], 

*=/c  +  l  /  — 1 

where  the  N  dimensional  vector  n  represents  the  additive 
channel  noise.  The  i.i.d.  components  of  n  are  zero  mean 
complex  Gaussian  with  variance  <r2.  Further,  the  users’  chan¬ 
nel  symbols  are  denoted  as  xk  e  X ,  1  <  k  <  K,  having  equal 
power  o%  and  the  superscript  d  marks  the  already  known  chan¬ 
nel  symbols  of  previously  decoded  users.  Assuming  uncorre¬ 
lated  channel  estimation  errors  for  the  different  users  as  well 


as  paths  and  time  slots,  a  lower  bound  on  the  resulting  signal 
to  interference  ratio  SIRfc[p)  at  the  output  of  an  MMSE  filter 
extracting  the  signal  of  user  k  from  yk[y]  can  be  solved  for 

A’  —too  with  \hk[y]\2  =  I^mMI2  as 

Theorem:  For  arbitrarily  long  spreading  sequences 
and  constant  load  fi  the  normalized  signal  to  interfer¬ 
ence  ratio  SIR(a  =  k/N,/3  =  K/N)  =  SIRi[p]/|/i*:[p]|2  is 
in  probability  lower  bounded  by 


where 

7L(a,/d) 


( .  ML-1)J 
y  &2  1  +  7l(o\  0)J 


M/3 


oo 

~a)Lh/ 


C/jixi2(C) 


+  7l(q,^)C 


d£  +a 


OO 

w 


\h\2+J 


(0 


+  7  L(<+/3)C 


dC 


-  I 


Here,  /|/i|2+j(C)  35  well  as  /a|x|2(0  denote  the  pdf  of  |h|2  + 
J,  as  well  as  the  pdf  of  the  squared  absolute  value  of  the 
transmit  symbols  x  £  X  multiplied  by  a,  respectively. 

In  addition,  for  single  path  fading  channels  an  upper  bound 
can  be  given  resulting  from  the  assumption  of  correlated  chan¬ 
nel  estimation  errors. 

Lemma:  For  a  single  path  fading  channel  SIR(q,/3) 
is  for  N—>oo  upper  bounded  by 


SIR(q,/3)  < 


7  V{a,/3) 

1  +  J7u(q,/?)’ 


where  7  V,  0)  =  +  a  /  ^-^Kdc) 

Based  on  the  above  formulas  as  well  as  results  on  iterative 
channel  estimation  we  can  show  that  successive  cancellation 
yields  considerable  gains  compared  to  iinear  interference  sup¬ 
pression  even  if  the  channel  state  is  not  know  exactly  at  re¬ 
ceiver  site.  It  turns  out  that  this  advantage  depends  heavily  on 
the  system  load  as  well  as  number  of  propagation  paths.  More¬ 
over,  we  can  show  that  also  in  this  case  nonorthogonal  multiple 
access  can  reach  a  higher  spectral  efficiency  than  orthogonal 
schemes.  Finally,  it  is  worth  noting  that  other  nonlinear  re¬ 
ceivers,  systems  with  multiple  transmit  and  receive  antennas 
as  well  as  the  problem  of  imperfectly  known  reference  symbols 
for  channel  estimation  can  be  treated  analytically  in  the  same 
way. 
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Abstract  —  The  performance  of  the  linear  paral¬ 
lel  interference  cancellation  (LPIC)  receiver  in  a  syn¬ 
chronous  multiuser  CDMA  system  with  binary  sig¬ 
naling  is  studied.  We  show  that  there  exist  conditions 
under  which  the  LPIC  receiver  underperforms  other 
receivers  and  characterize  its  asymptotic  behavior. 

I.  Introduction  and  Motivation 

The  linear  parallel  interference  cancellation  (LPIC)  receiver 
has  been  been  studied  in  the  literature  recently  due  to  its 
low  computational  complexity  and  good  performance  under 
certain  operating  conditions.  In  this  paper,  we  compare  the 
performance  of  the  LPIC  receiver  to  the  hard  parallel  inter¬ 
ference  cancellation  (HPIC)  and  conventional  matched  filter 
(MF)  receivers. 

We  assume  the  standard  discrete  synchronous  CDMA  sys¬ 
tem  model  [1]  with  K  users  using  binary  (±1)  spreading 
sequences  of  length  N  and  binary  signaling  over  an  addi¬ 
tive  white  Gaussian  noise  (AWGN)  channel  with  variance  a2. 
Let  R  be  the  K  x  K  normalized  spreading  sequence  cross¬ 
correlation  matrix  and  A  be  the  K  x  K  diagonal  matrix  of 
positive  real  amplitudes.  If  the  spreading  sequences  are  cho¬ 
sen  randomly,  we  resort  to  large  system  techniques  [2,  3]  to  get 
analytical  results.  By  “large  system”,  we  mean  that  K  — >■  oo 
and  N  — t  oo  but  K/N  —y  (3 ,  for  some  constant  /?. 

In  parallel  interference  cancellation  (PIC),  the  desired 
user’s  decision  statistic  is  formed  by  subtracting  an  estimate 
of  the  multiple  access  interference  (MAI)  from  the  original  ob¬ 
servation  of  the  desired  user.  PIC  lends  itself  to  a  multistage 
implementation  in  which  M  stages  can  be  used  to  generate  the 
final  decision  statistics.  The  HPIC  receiver  generates  hard  bit 
decisions  at  each  stage  to  be  used  in  subsequent  stages,  while 
the  LPIC  receiver  passes  on  soft  information. 

The  goal  of  this  paper  is  to  develop  a  better  understand¬ 
ing  of  the  behavior  and  performance  of  the  LPIC  receiver. 
Some  authors  have  previously  noted  the  performance  limita¬ 
tions  of  the  LPIC  and  others  have  suggested  improvements. 
We  do  not  propose  to  fix  the  LPIC  receiver  but  rather  to  un¬ 
derstand  it  better  so  that  we  can  bound  the  operating  regions 
where  the  LPIC  receiver  exhibits  good  or  bad  performance.  In 
that  spirit,  we  present  a  collection  of  related  analytical  results 
which  expose  the  behavior  of  the  LPIC  receiver.  We  refer  to 
[4]  for  detailed  proofs. 

II.  Results  and  Conclusions 

Our  main  results  are  as  follows: 

1.  Let  MSE{p|C  and  MSE[(plc  be  the  mean  squared  error  of 
the  fth  user’s  MAI  estimate  for  the  two  stage  LPIC  and  two 
stage  HPIC  receivers  respectively.  Let  AMSE[(plc  be  the  ap¬ 
proximate  MSE  derived  by  using  a  Gaussian  approximation 

^his  research  is  supported  in  part  by  NSF  Grants  CCR- 
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for  the  MAI.  Then  for  any  R,  a ,  A,  K,  and  l,  we  show  that 
MSE[p|C  >  AMSE$IC. 

2.  Let  and  Pjj/p  be  the  error  probabilities  for  the  M- 

stage  LPIC  and  the  MF  respectively  for  the  kth  user.  Then 
for  any  k,  M,  R  /  J,  a  >  0,  and  interfering  user  amplitudes 
a(,)  W  ^  k,  there  exists  an  amplitude  threshold  a*  <  oo  such 
that  Plplc(M)  >  P^f  for  a(A,)  >  a*. 

3.  For  any  user  k  in  a  system  with  K  >  2  users,  odd  M,  equal 
amplitude  users  such  that  A  =  al  and  f  >  0,  there  exists  R 
such  that  Plp\c(M)  >  0.5.  We  say  that  the  kth  user  suffers. 

4.  Consider  the  behavior  of  the  LPIC  for  large  M.  Let  p(R) 
be  the  spectral  radius  of  R,  i.e.,  the  maximum  magnitude  of 
all  eigenvalues  of  R.  It  is  well  known  that  if  p(R)  <  2,  the 
LPIC  converges  to  the  decorrelating  detector.  Our  result  is  as 
follows.  If  p{R)  >  2,  there  exists  M*  and  at  least  one  k  such 
that  P/ p}c(M)  >  0.5  for  all  odd  integer  values  of  M  >  M* . 

5.  An  extra  constraint  on  R  allows  us  to  show  that  all 
users  can  suffer.  Suppose  p(R)  >  2  and  R  has  an  eigen¬ 
vector,  associated  with  an  eigenvalue  greater  than  two,  with 
all  nonzero  entries.  Then  there  exists  M *  such  that  for  all  k, 
P\_ p ic ( M )  >  0.5  for  all  odd  integer  values  of  M  >  M* . 

6.  For  randomly  chosen  spreading  sequences  and  large  sys¬ 
tems,  we  show  that  for  any  a,  A,  and  i,  FJ[MSElpiC]  > 
,E[MSE[{p|C].  Note  that,  unlike  Result  1,  we  need  not  rely 
upon  the  Gaussian  approximation  for  the  MAI. 

7.  For  randomly  chosen  spreading  sequences  and  large  sys¬ 
tems,  we  show  that  if  (3  =  K/N  >  (y/2  —  l)2  «  0.17,  then 
p(R)  >  2  almost  surely.  Result  4  then  indicates  that  at  least 
one  user  will  suffer  in  each  bit  interval  for  large  odd  M.  More 
precisely,  we  note  that  the  misperforming  user  may  be  differ¬ 
ent  for  each  realization  of  R,  and  so  no  one  user  need  suffer 
on  average.  Numerical  experiments  suggest  that,  on  average, 
all  users  may  indeed  suffer  as  Af  — t  oo,  but  a  proof  of  this 
conjecture  is  left  as  an  open  problem. 

The  results  in  this  paper,  which  indicate  that  the  LPIC  has 
the  potential  to  misperform,  are  intended  to  fill  in  some  of  the 
gaps  in  our  understanding  of  PIC  receivers.  We  hope  they 
can  serve  as  cautionary  guidelines  concerning  the  application 
of  LPIC  receivers  to  CDMA  communication  systems. 
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I.  Summary 

We  combine  the  adaptive  (Least  Squares)  Parallel-Multiuser 
Decision  Feedback  Detector  for  CDMA  with  short  spreading 
sequences,  presented  in  [1],  with  iterative  (turbo)  decoding 
and  soft  cancellation,  presented  in  [2].  The  resulting  receiver 
requires  only  a  training  sequence  and  (coarse)  timing  for  es¬ 
timation  of  all  filter  coefficients,  and  performs  close  to  the 
single-user  bound  with  relatively  low  Eb  /No .  Prior  knowledge 
of  spreading  codes  and  channels  is  unecessary. 

For  simplicity,  we  consider  a  synchronous  CDMA  system. 
The  extension  to  an  asynchronous  CDMA  system  with  multi- 
path  can  be  achieved  by  using  an  expanded  observation  win¬ 
dow  [1].  A  block  diagram  of  the  receiver  is  shown  in  Figure 
1.  Each  user’s  information  bits  are  convolutionally  encoded 
and  interleaved  before  transmission.  The  received  vector  of  N 
samples  during  symbol  interval  i  is  r(i)  =  Pd(i)  -|-  n(i)  where 
P  is  the  matrix  of  spreading  codes,  d  is  the  binary  vector  of 
coded  symbols  across  users,  and  n  is  noise.  The  sequence  of 
vectors  r(l),  ■  •  ■ ,  r (M),  corresponding  to  a  packet,  is  the  input 
to  the  iterative  receiver.  Referring  to  the  figure,  the  output  of 
the  DFD  is 

y(m)(i)  =  (F^Vrti)  -  (B(^yatm)(i). 

where  and  B^  are  the  feedforward  and  feedback  fil¬ 

ters,  respectively,  and  is  the  vector  of  soft  decisions,  all 
corresponding  to  the  mth  iteration. 

For  purposes  of  MAP  decoding,  we  assume  that  the  resid¬ 
ual  interference  plus  noise  at  the  output  of  the  DFD  is  Gaus¬ 
sian.  It  is  then  possible  to  estimate  the  a  priori  probabilities 

Pr  ^y£m'(i)|dlfc(2)  =  ±1^  without  additional  side  information. 
These  are  deinterleaved  and  input  to  the  MAP  decoder  for  the 
convolutional  code.  The  MAP  decoder  generates  the  a  poste¬ 
riori  probabilities  Pr[c4(i)  —  ±1],  which  are  used  to  compute 
the  input  to  the  feedback  filter  =  E[d], 

The  filters  F1-771'  and  B(m'  are  selected  to  minimize  the 
Least  Squares  (LS)  cost  function 

'  M 

£ls  =  Y,  Hd«  -  (F(m))*r(i)  +  (B^)+d^(i)||2  (1) 

i—0 

at  each  iteration  m,  and  are  constant  over  the  duration  of 
the  packet.  The  symbols  d  are  obtained  either  from  a  train¬ 
ing  sequence  or  in  decision- directed  mode.  In  the  latter  case, 
simulation  results  show  that  using  soft  decisions  gives  better 
performance  than  using  hard  decisions. 

Figure  2  shows  a  plot  of  packet  error  rate  vs.  E(,  / No  for  dif¬ 
ferent  receivers.  For  the  MMSE  Parallel  (P)-DFD  curve,  the 

xThis  work  was  partially  supported  by  ARO  under  Grant 
DAAD 19-99- 1-0288  and  by  Southern-Poro  Communications 


Figure  1:  Iterative  P-DFD  receiver 


Figure  2:  Receiver  comparison  with  K  —  12  users,  N  = 
16,  code  rate  R  =  1/2  (8  chips  per  coded  bit). 

filters  F^mi  and  B' m-*  are  computed  assuming  perfect  feed¬ 
back  (d  =  d).  The  “approximate”  LS  P-DFD  curve  is  based 
on  minimizing  (1),  but  with  soft  decisions  for  d.  The  results 
are  averaged  over  random  spreading  sequences.  Data  packets 
contain  500  information  symbols  and  200  training  symbols. 

Because  of  the  high  load,  the  linear  receiver  has  a  packet 
error  rate  near  one.  The  MMSE  P-DFD  performs  worse  than 
the  LS  P-DFD  since  the  latter  measures  and  exploits  the  joint 
statistics  of  the  soft  estimates  with  the  transmitted  symbols. 
The  break  point  near  3  dB  shown  in  the  Figure  is  close  to  the 
fundamental  limit  based  on  large  system  capacity. 
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Abstract  —  Space-time  block  code  design  and 
decoder  design  are  addressed  for  Code-Division 
Multiple-Access  (CDMA)  systems.  Optimal  code  de¬ 
signs  are  found  by  optimizing  the  Chernoff  bound  of 
the  probability  of  decoding  error.  From  this,  the  di¬ 
versity  gain  and  the  coding  gain  are  determined  for 
the  CDMA  scenario.  The  resultant  optimal  code  de¬ 
signs  are  classified  and  analyzed.  Both  optimal  and 
moderate  complexity  suboptimal  decoding  algorithms 
are  proposed  and  evaluated. 

I.  Introduction  &  System  Model 

Transmit  diversity  methods  have  proven  effective  for  com¬ 
batting  fading  in  wireless  communication  systems[l,  -2,  3,  4). 
In  this  paper,  we  focus  on  determining  space-time  block  coding 
methods  for  Code-Division  Multiple-Access  systems.  Due  to 
the  assumption  of  independent  fading  for  different  users;  the 
multiuser  coding  problem  decouples  to  multiple  single-user  cod¬ 
ing  problems.  The  presence  of  spreading  codes  in  the  CDMA 
problem  yields  interesting  differences  in  code  designs  and  code 
metrics  relative  to  the  single-user  narrowband  case  [4], 

The  up-link  between  K  users  and  one  base  station  is  con¬ 
sidered.  For  each  user,  kc  information  bits  are  mapped  to  one 
of  2kc  Space-Time  Block  Codes  (STBC),  D.  The  codeword 
D  is  a  matrix  of  dimension  Lt  x  M;  where  M  is  the  num¬ 
ber  of  transmit  antennae  and  Lt  is  the  duration,  in  symbol 
intervals,  of  the  code.  It  is  assumed  that  the  base  station 
has  N  receive  antennae.  We  further  assume  that:  the  fading 
processes  associated  with  each  transmit  antenna  are  indepen¬ 
dent;  the  channel  is  constant  over  the  duration  of  the  block 
code  (quasi-static)  and  is  known  perfectly;  and  that  the  trans¬ 
mission  is  synchronous.  For  the  system  under  consideration, 
a  different  spreading  code  is  employed  for  each  transmit  an¬ 
tenna  and  it  is  assumed  that  the  receiver  has  full  knowledge 
of  these  spreading  codes. 

II.  Performance  Criteria  &;  Code  Designs 

It  can  be  shown  that  optimizing  the  upper  bound  on  the 
probability  of  decoding  error  yields  two  criteria  for  space-time 
block  code  design.  Performance  is  determined  by  a  key  ma¬ 
trix  'L  =  ADWAD  ©  R.  1,  where  A D  is  a  codeword  difference 
matrix  and  R  is  the  spreading  code  cross-correlation  matrix. 
The  resultant  design  “metrics”  over  all  pairs  of  codewords  are: 
diversity  gain  Ah  —  Nr  min,  where  rmin  is  the  minimum 
rank  of  <f>. 

coding  gain  Ap  =  (ni=i"  's  the  smallest  product  of 

all  the  non- zero  eigenvalues  of  4>. 

These  “metrics”  are  analogous  to  those  obtained  in  [1]  for 
the  narrowband  case,  but  due  to  the  presence  of  the  cross¬ 
correlation  matrix,  R,  some  new  features  appear  in  the  re¬ 
sulting  optimal  codes.  The  goal  of  code  design  is  to  find  2kc 

2This  work  was  supported  by  NSF  Grant  ANI-9809018. 

1Here  0  represents  Schur  product. 


distinct  STBCs  such  that  rm,n  is  maximized  and  given  this 
rank,  that  Av  is  also  maximized.  Codes  satisfying  these  con¬ 
ditions  are  deemed  optimal.  The  following  two  propositions 
can  be  proved  regarding  diversity  gain'and  coding  gain: 
PROPOSITION  1  If  AD  has  no  zero  columns  and  if  R  is 
positive  definite,  full  diver'sity  gain  is  always  achieved. 

PROPOSITION  2  If  Rij  —  p  for  i  ^  j ,  the  coding  gain  is 
a  monotonically  decreasing  function  of  p. 

Note  that  for  the  narrowband  case,  p  =  1  and  R  is  thus  sin¬ 
gular.  For  the  CDMA  case,  due  to  these  two  propositions,  we 
focus  on  maximizing  Av.  We  observe  that  non-unitary  codes 
usually  outperform  unitary  codes.  Consider  the  optimal  code 
sets  for  BPSK  modulation.  We  discuss  the  case  of  k,  —  2. 
M  =  2  and  Lt  =  2.  The  resulting  optimal  codes  can  be  par¬ 
titioned  into  three  equivalence  classes.  Each  element  of  the 
equivalence  class  can  be  transformed  into  another  element  via 
simple  isometries.  Each  class  has  a  uniform  distance  spectrum 
across  codewords.  Class  1  and  2  are  optimal  for  all  \p\  <  1 
while  Class  3  is  optimal  for  \p\  <  \/3/2.  Class  2  is  essentially 
Alamouti’s  orthogonal  code  set  [3]  while  Class  1  is  non-unitary 
[4],  Interestingly,  for  QPSK  modulation,  we  can  find  an  opti¬ 
mal  non-unitary  code  set  which  outperforms  all  unitary  codes 
for  all  |p|  <  1.  The  optimal  codes  are  tabulated  below. 


symbol 

Class 

D1 

D2 

D3 

D4 

BPSK 

i 

1  1 

1  1 

1  -1 

-1  -1 

-1  1 

1  -1 

-1  -1 
-1  I 

BPSK 

2 

1  1 

1  -1 

1  -1 

-1  -1 

-1  1 

1  1 

-1  -1 
-1  1 

BPSK 

3 

1  1 

1  1 

1  -1 

-1  1 

-1  1 

1  -1 

-1  -1 

-1  -1 

QPSK 

1  1 

1  i 

1  -1 
-1  i 

-1  i 

-i  1 

-i  -i 

i  -i 

III.  Decoding  Algorithms 

Three  types  of  decoders  are  considered:  the  optimal 
maxmimum- likelihood  (ML)  decoder,  a  joint  multiuser  mini¬ 
mum  mean-squared  error  decoder  and  a  combined  interference 
cancellation/ML  decoder.  These  algorithms  perform  as  pre¬ 
dicted  with  the  ML  decoder  offering  the  best  performance  at 
the  expense  of  computational  complexity.  The  two  suboptimal 
algorithms  offer  solid  performance  with  reduced  complexity. 
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Abstract  —  Multiple  antennas  can  greatly  increase 
the  data  rate  and  reliability  of  a  wireless  communi¬ 
cation  link  in  a  fading  environment,  but  the  practical 
success  of  using  multiple  antennas  depends  crucially 
on  our  ability  to  design  high-rate  space-time  constel¬ 
lations  with  low  encoding  and  decoding  complexity.  It 
has  been  shown  that  full  transmitter  diversity,  where 
the  constellation  is  a  set  of  unitary  matrices  whose 
differences  have  nonzero  determinant,  is  a  desirable 
property  for  good  performance. 

We  use  the  powerful  theory  of  fixed-point-free 
groups  and  their  representations  to  design  high-rate 
constellations  with  full  diversity.  Furthermore,  we 
thereby  classify  all  full-diversity  constellations  that 
form  a  group,  for  all  rates  and  numbers  of  transmit¬ 
ter  antennas.  The  group  structure  makes  the  constel¬ 
lations  especially  suitable  for  differential  modulation 
and  low-complexity  decoding  algorithms. 

The  classification  also  reveals  that  the  number  of 
different  group-structures  with  full  diversity  is  very 
limited  when  the  number  of  transmitter  antennas  is 
large  and  odd.  We  therefore  also  consider  extensions 
of  the  constellation  designs  to  nongroups.  We  con¬ 
clude  by  showing  that  many  of  our  designed  constel¬ 
lations  perform  excellently  on  both  simulated  and  real 
wireless  channels. 

A  complete  copy  of  this  paper  is  available  on 
the  web  at  http://mars.bell-labs.com  under  the  ti¬ 
tle  “Representation  Theory  for  High-Rate  Multiple- 
Antenna  Code  Design.”  Other  related  papers  are  also 
available  at  this  web  site. 


I.  An  Example  of  a  High-Rate  Code 


As  an  example  of  a  high-rate  group  code  that  we  find,  we 
plot  the  performance  of  SL2(F5),  the  group  of  2  x  2  matri¬ 
ces  over  the  field  F5  with  determinant  one.  This  group  has 
a  representation  as  120  complex  2x2  unitary  matrices  suit¬ 
able  for  transmission  over  a  two-antenna  fading  channel.  The 
group  is  fixed-point-free  which  means  that  its  constellation 
has  full  diversity.  We  also  plot  the  performance  of  the  best 
cyclic  group  with  the  same  rate  [2],  a  2  x  2  orthogonal  design 
[4]  (which  is  not  a  group)  and  a  generalized  quaternion  group 
code  [3]  with  similar  rates.  All  of  these  codes  can  be  used  with 
a  known  channel  (as  shown),  or  they  can  be  used  differentially 
when  the  channel  is  unknown  and  with  a  performance  loss  of 
approximately  3  dB. 


Block-error  rate  performance  of  the  group  SL2(F5)  com¬ 
pared  with  constellations  from  other  constructions  for 
M  —  2  transmitter  antennas  and  N  —  1  receiver  antenna. 
The  channel  is  known  at  the  receiver.  The  solid  line  is 
SL2(F5),  which  has  120  unitary  matrices  (rate  R  «  3.45). 
The  dashed  line  is  an  orthogonal  design  with  11th  roots 
of  unity  ( R  ss  3.46).  The  dashed-dotted  line  is  the  best 
diagonal  (Abelian  group)  construction  (R  «  3.45).  The 
dotted  line  is  the  quaternion  group  with  128  matrices 
(R  —  3.5). 
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Abstract  —  This  paper  investigates  the  application 
of  the  expectation-maximization  algorithm  for  sys¬ 
tems  with  multiple  transmit  and/or  receive  antennas 
in  presence  of  fast  fading  channels. 

I.  Introduction 

The  use  of  space-time  coding  and  modulation  techniques 
can  improve  system  performance  and  combat  the  damaging 
effects  due  to  the  presence  of  fading.  Most  work  has  assumed 
that  accurate  estimates  of  current  channel  fading  conditions 
are  available  at  the  receiver  [4,  5,  6].  When  channel  estimation 
becomes  very  challenging,  e.g.  in  fast  fading  channels,  it  is  of 
interest  to  explore  joint  channel  estimation  and  data  detection 
methods  in  order  to  approach  coherent  performance  with  a 
minimum  of  pilot  symbols. 

For  single-antenna  channels,  suboptimal  receivers  based 
on  the  expectation-maximization  (EM)  algorithm  have  been 
shown  to  perform  well  under  fast  fading  [2]  and  multipath  fad¬ 
ing  [3]  conditions.  The  EM  algorithm  is  a  general  two-steps 
procedure  for  iterative  maximum  likelihood  estimation.  The 
algorithm  was  first  formalized  in  the  statistics  literature  by 
Dempster,  Laird,  and  Rubin  [1],  and  has  since  been  applied 
to  a  variety  of  communications  problems. 

We  consider  a  system  with  multiple  transmit  and  receive 
antennas,  and  propose  a  suboptimal  space-time  receiver  based 
on  the  EM  algorithm,  which  performs  iterative  joint  channel 
estimation  and  data  sequence  detection  in  alternating  steps. 
We  derive  simple  expressions  for  these  steps  and  evaluate  the 
performance  of  the  resulting  receiver  for  several  modulation 
techniques. 

II.  Receiver  Structure 

Consider  a  wireless  channel  with  t  transmit  and  r  receive 
antennas.  The  signal  sample  taken  by  receive  antenna  i  at 
time  k  can  be  modeled,  for  t  =  1, . . . ,  r  ,  k  =  1, . . . ,  n,  as 

t 

Vik  =  ^^hij(k)Cjky/pt  +  Tlik  ,  (1) 

3  =  1 

where  Cjk  is  the  constellation  point  transmitted,  hij(k)  is  the 
fading  path  gain,  pt  is  the  signal-to-noise  ratio  and  n,k  are 
noise  samples.  In  matrix  form,  (1)  can  be  rewritten  as 

y  =  \fptHvCv  +  N  ,  (2) 

where  Hv  =  [Hi  :  Hi  :  •  •  •  :  Hn\  is  r  x  nt,  Cv  is  the  nt  x  n 
block-diagonal  matrix,  pt  is  the  signal-to  noise  ratio  and  N  is 

xThis  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  grants  CCR-9903107,  and  by  the  Center  for  Advanced 
Computing  and  Communication. 


the  noise  matrix.  Each  row  of  Hv  is  iid  with  covariance  matrix 

S. 

The  receiver  performs  iteratively  two  steps:  Expectation 
step  and  Maximization  step.  The  E-step  can  be  evaluated  as 

Q(CV\C'V)  =  £  [log  p(Y\Hv,Cv)  |  Y,Clv] 

=  -I \{{Y  -  JftHlCv)\Y  -  JplHiCv) 
+rptCl^,vCv}  -  rn  log7r  ,  (3) 

where  HI  =  ^YC^E},  and  E*  =  (S'1  +  ptClvC'v')-1  are 
the  channel  estimate  and  the  error  covariance  matrix,  respec¬ 
tively,  at  the  i-th  iteration. 

The  M-step  calculates  the  next  estimate  by  C*+1  = 
argmaxcv  Q(CV\C'V).  The  expression  (3)  can  be  recursively 
maximized  by  using  the  Viterbi  Algorithm  with  a  modified 
metric  which  consists  of  the  Euclidean  distance  metric  plus 
a  quadratic  term  that  depends  on  the  previous  decoded  se¬ 
quence  CJ.  Thus,  each  iteration  consists  of  an  estimation 
phase  followed  by  a  detection  phase. 

The  initial  estimate  C°  for  the  iterative  receiver  will  gen¬ 
erally  be  derived  using  a  pilot-symbol-assisted  modulation 
scheme. 

Simulations  suggest  that  the  receiver  can  often  achieve 
near-coherent  performance  with  only  two  iterations  and  us¬ 
ing  a  very  small  number  of  pilot  symbols  under  fast  fading 
conditions. 
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Abstract  —  Recently,  the  authors  presented  a 
new  threaded  space-time  architecture  [1]  [2]  combining 
generalized  layered  transmission,  advanced  iterative 
multi-user  detection  techniques,  and  space-time  code 
design  that  provides  superior  performance  compared 
to  the  layered  architectures  proposed  by  Foschini  et 
al.  [3]  and  Tarokh  et  al.  [7].  In  this  paper,  we  discuss 
the  design  of  algebraic  space-time  codes  for  layered 
and  non- layered  architectures. 

I.  Introduction 

Two  essentially  different  approaches  have  been  proposed  for 
exploiting  the  spatial  diversity  available  to  systems  with  multi¬ 
ple  transmit  and  receive  antennas  operating  over  fading  chan¬ 
nels.  In  the  first,  channel  coding  is  performed  using  so-called 
“space-time  codes”  [4] [6] [5].  In  the  second,  conventionally- 
encoded  data  streams  are  “layered”  in  space  and  time  by  the 
transmitter  and  separated  at  the  receiver  using  interference- 
cancellation  and  interference-avoidance  signal  processing  [3]. 

A  new  “threaded”  space-time  architecture,  introduced  in 

[1][2],  shows  that  significant  gains  are  possible  without  undue 
complexity,  however,  when  the  encoding,  interleaving,  and  dis¬ 
tribution  of  transmitted  symbols  among  different  antennas  are 
optimized  to  maximize  spatial  diversity,  temporal  diversity, 
and  coding  gain  in  accordance  with  space-time  code  design 
principles. 

II.  Algebraic  Space-Time  Code  Design 

We  consider  a  communication  system  with  n  transmit  and  m 
receive  antennas.  A  space-time  code  C  consists  of  an  underly¬ 
ing  channel  code  C  together  with  a  spatial  modulator  function 
f  that  parses  the  modulated  symbols  among  the  transmit  an¬ 
tennas.  Binary  rank  criteria  developed  in  [5]  made  possible 
the  first  designs  of  space-time  codes  by  algebraic  means. 

Thm  1  (Stacking  Construction)  [5]  Let  C  be  the  space- 
time  code  of  dimension  k  consisting  of  the  n  x  l  code 
word  matrices  c  =  [  iMi  5M2  •  •  •  iMn  ]  ,  where 

Mi,  M2, . . . ,  M„  are  binary  matrices  of  dimension  k  x  l  and 
x  denotes  the  k-tuple  of  information  bits.  Then,  for  BPSK 
transmission  over  the  quasi-static  fading  channel,  C  achieves 
full  spatial  diversity  nm  provided 

Vai,  0,2,  ■ .  ■ ,  a„  6  F,  not  all  zero  : 

M  =  ffliMi  0  d2M2  0  ■  •  •  0  anMn  is  of  rank  k  over  F. 

Furthermore,  if  Mi,  M2, . . . ,  M„  are  Z4 -valued  matrices 
whose  projections  modulo  2  satisfy  the  above  constraint,  then 
the  corresponding  Z4  space-time  code  C  achieves  full  spatial 
diversity  for  QPSK  transmission. 

In  a  layered  architecture,  a  similar  algebraic  construction 
is  applicable  to  arbitrary  signaling  constellations  Q  of  size  2b. 
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Thm  2  (Threaded  Stacking  Construction)  Let  L  be 

a  layer  of  spatial  span  n.  Given  binary  k  x.  t  matrices 
Mi , M2, . .  • ,  Mn,  let  C  be  the  binary  code  consisting  of  the 
vectors  c(x)  =  1M1  |  XM2  |  •  •  •  |  iM„,  where  x  denotes  a  k- 
tuple  of  information  bits.  Let  f l  be  the  spatial  modulator  in 
which  the  modulated  symbols  associated  with  iMj  are  trans¬ 
mitted  in  the  f/b  symbol  intervals  of  L  that  are  assigned  to 
antenna  j.  Then,  the  space-time  code  C  consisting  of  C  and 
f l  achieves  spatial  diversity  dm  in  a  quasi-static  fading  chan¬ 
nel  iff  d  is  the  largest  integer  such  that 

Va  1 ,  a 2, .  ■  ■  ,an  6  F,ai  +  <12  +  ■  ■  ■  +  an  =  n  —  d  +  1  : 

M  =  [aiMiaiMi  •  ■  •  a„Mn]  is  of  rank  k  over  F. 

The  stacking  constructions  are  general  for  any  number  of 
antennas  and  apply  to  trellis  as  well  as  block  codes.  The  obser¬ 
vation  in  [5]  that  the  stacking  construction  is  readily  satisfied 
within  the  class  of  binary  rate  1  /n  convolutional  codes  is  par¬ 
ticularly  noteworthy.  Indeed,  most  of  the  well-known  convo¬ 
lutional  codes  of  rate  1/n  with  optimal  d;ree  can  be  formatted 
to  achieve  full  spatial  diversity! 

Similarly,  the  natural  space-time  codes  associated  with  the 
general  class  of  binary  rate  k/n  convolutional  codes  are  attrac¬ 
tive  candidates  for  the  layered  space-time  architecture  since 
they  can  be  easily  formatted  via  periodic  bit  interleaving  to 
satisfy  the  generalized  layered  stacking  construction.  In  this 
case,  a  total  transmission  rate  of  b(n  —  d+  1)  bits  per  signaling 
interval  can  be  achieved,"  which  is  the  maximum  possible. 

Example.  The  natural  layered  space-time  code  associated  with 
the  optimal  8-state,  dfree  =  5  convolutional  code  (with  gen¬ 
erators  Go(-D)  —  1  +  D2  and  G\(D)  =  1  +  D  +  D 2)  achieves 
maximum  possible  spatial  diversity  for  n  =  2,4,  and  6  trans¬ 
mit  antennas.  The  achieved  diversity  levels  are  d  —  2,  3,  and 
4,  respectively,  in  accordance  with  Theorem  2. 
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Abstract  —  The  bit  error  rate  of  tailbiting  convolu¬ 
tional  encoders  is  compared  for  the  case  that  a  priori 
source  information  is  available,  and  for  the  case  that 
it  is  not.  The  set  of  best  encoders  depends  on  the  a 
priori  information,  and  is  not  identical  to  the  set  of 
encoders  maximizing  minimum  distance.  Character¬ 
istics  that  govern  the  BER  are  analyzed. 

Summary 

The  bit  error  rate  (BER)  is  an  important  encoder  crite¬ 
rion.  The  question  arises  which  encoder  to  use  in  a  specific 
environment.  We  compare  the  BER  performance  of  tailbiting 
convolutional  (TB)  encoders  for  the  BSC  and  AWGN  channel 
with  different  noise  powers  when  differing  amounts  of  a  priori 
information  are  available  to  the  decoder. 

A  TB  encoder  must  end  in  the  same  state  it  started.  A 
codeword  can  be  viewed  as  a  path  around  a  circular  trellis. 
TB  codes  are  an  efficient  tool  to  supply  error  protection  for 
short  packets,  since  they  do  not  suffer  any  rate  loss  due  to  a 
terminating  zero-tail.  In  general,  they  achieve  the  minimum 
distance  (dm j„)  of  the  best  known  block  codes.  Besides,  TB 
codes  allow  trellis  decoding. 

The  apt  decoder  is  the  symbol-by-symbol  maximum  a  poste¬ 
riori  algorithm,  denoted  as  MAP  decoder.  With  trellis  decod¬ 
ing,  the  well-known  BCJR  algorithm  computes  the  a  posteri¬ 
ori  probability  (APP)  of  each  data  symbol.  The  data  symbol 
with  highest  APP  is  chosen  as  the  MAP  decoder  output.  In 
order  to  achieve  a  lower  BER  the  a  priori  information  is  taken 
into  account.  In  [1]  we  extended  the  original  BCJR  algorithm 
to  the  TB  environment.  This  algorithm  (TB-BCJR)  does  not 
quite  obtain  the  true  MAP  output,  but  it  is  a  factor  2m  simpler 
than  the  true  MAP  decoder,  where  m  denotes  the  memory  of 
the  TB  encoder.  Here,  the  BER  of  rate  1/2  feedforward  TB 
encoders  is  measured  using  the  TB-BCJR  algorithm. 

Let  L  be  the  tail-biting  length.  The  zth  bit  of  the  data  word 
is  denoted  by  Ui,  \<i<L.  Define  the  source  bit  probability 
</>i  =  P{u;  =0),  l<i<L.  Assume  now  that  </>;  =  1/2,  Vi.  A 
table  of  best  rate  1  /2  TB  encoders  for  the  BER  criterion  is 
shown  in  [2].  The  BERs  are  listed  at  three  SNR  benchmarks 
for  the  BSC  and  AWGN  channels.  In  general,  the  list  of  best 
encoders  differs  in  each  of  these  cases.  The  main  conclusions 
are  as  follows. 

(i)  The  BER  of  the  best  encoder  of  memory  m  does  not  de¬ 
crease  further  with  growing  L  once  the  ratio  L/m  exceeds  4-5. 
Analogously,  the  BER  does  not  decrease  much  with  growing  m 
and  constant  L.  The  critical  ratio  L/m  relates  to  the  decision 
depth  parameter  of  the  encoder,  which  relates  asymptotically 
to  the  Gilbert- Varshamov  parameter. 

(ii)  The  best  encoder  in  a  bad  channel  is  a  systematic  one. 
Systematic  feedforward  codes  have  asymptotically  half  the  free 
distance  growth  with  m,  compared  to  general  codes.  Thus  the 
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best  encoder  in  a  channel  of  unknown  quality  is  a  feedback 
systematic  one,  since  these  have  full  distance  growth;  its  BER 
will  be  low  in  both  good  and  bad  channels. 

(iii)  In  about  half  the  combinations  m,  L  the  best  encoders  in 
a  BSC  differ  significantly  from  those  in  the  AWGN  channel. 

(iv)  Due  to  the  mapping  between  information  and  codebits, 
the  best  encoders  are  in  general  not  the  ones  maximizing  dmin. 

Now,  assume  an  unbalanced  information  source,  i.e., 
4>i^  1/2,  \<i<L.  Some  data  words  and  their  corresponding 
codewords  have  a  higher  probability  than  others. 

In  [3]  we  present  good  TB  encoders  in  terms  of  BER  when  a 
priori  information  is  available.  Their  performances  are  listed 
for  the  same  channels  as  before.  The  main  results  are: 

(i)  In  general,  for  BSCs  the  best  encoders  for  balanced 
sources  perform  badly  if  a  lot  of  a  priori  information  is  avail¬ 
able,  e.g.,  if  </>,= 0.95,  Vi.  This  is  because  now  the  codewords 
with  low  Hamming  weight  are  more  likely  to  occur.  If  the 
codewords  with  dmin,  i.e.,  the  ones  which  are  most  likely  to  be 
decoded  in  error  for  a  balanced  source,  are  unlikely  to  occur, 
then  the  a  priori  information  will  drastically  reduce  the  BER. 

(ii)  For  BSCs  the  best  encoders  for  (pi=0. 5,  Vi,  are  generally 
also  the  best  when  </>i~ 0.7,  Vi.  If  the  source  is  not  strongly 
unbalanced,  the  effect  described  in  (i)  looses  importance. 

(iii)  The  best  encoders  for  <f>i= 0.95,  Vi,  perform  poorly  when 
used  for  data  with  0,  =  0.5  or  <j>i  =  0.7,  compared  to  the  best 
encoders  for  those  cases,  for  which  the  mapping  of  data  to 
codebits  and  the  distance  spectrum  play  the  major  role.  The 
best  encoders  for  AWGN  channels  for  <j>i= 0.5,  Vi,  also  differ 
from  the  best  encoders  for  strongly  unbalanced  sources,  but 
their  BER  can  be  almost  as  good. 

(iv)  For  </>i= 0.95,  Vi,  the  best  encoders  for  a  BSC  are  not  the 
best  encoders  for  a  AWGN  channel,  and  vice  versa,  although 
they  also  perform  sufficiently  well  for  the  other  channel  model. 

(v)  The  encoders  maximizing  dm\n  are  generally  not  those 
with  best  BER. 

(vi)  When  not  all  data  bits  carry  the  same  a  priori  informa¬ 
tion,  the  BER  depends  on  the  positions  of  the  bits  with  most 
a  priori  information.  Similar  arguments  as  in  (i)  explain  this 
effect.  In  general,  the  TB  encoder  and  the  amount  and  struc¬ 
ture  of  the  a  priori  information  must  be  tuned  to  each  other. 
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Abstract  —  Tailbiting  trellis  representations  of  linear 
block  codes  with  an  arbitrary  sectionalization  of  the 
time  axis  are  studied.  A  new  lower  bound  on  the  max¬ 
imal  state  complexity  of  an  arbitrary  tailbiting  code 
is  derived.  The  asymptotic  behavior  of  the  derived 
bound  is  investigated.  Some  new.  tailbiting  represen¬ 
tations  for  linear  block  codes  of  rates  R  =  1/c,  c  =  2,3, 
4  are  presented. 


I.  Introduction 

Tailbiting  is  a  technique  to  terminate  a  convolutional  code 
into  a  block  code  [1].  We  focus  on  constructions  and  bounds 
for  sectionalized  tailbiting  trellises  since  they  may  have  less 
complexity  than  non-sectionalized  ones. 

We  consider  an  ( N ,  K,  dm in)  binary  linear  block  code  C  with 
a  generator  matrix  G  =  {rj},r  =  1,  We  say  that  G  is 

given  in  tailbiting  span  form  if  it  consists  of  rows  such  that 
(circular)  start(rj)  ^  start (r,)  and  end(r<)  ^  end(r ,),  i  ^  j, 
where  staxt(x)  and  end(x  )  denote  the  (circular)  number  of  the 
first  and  the  last  nonzero  section  in  the  vector  x,  respectively. 
The  ith  section  of  rj  is  active  if  i  6  [start(r,),  end(r,)).  The 
maximal  state  complexity  or  p- state  complexity  of  the  trellis 
is  defined  [2]  as  p  =  max,{)og2  |Ai|},  where  \Ai\  denotes  the 
number  of  rows  where  the  ith  section  is  active. 


II.  Lower  bound  on  the  state  complexity  for 

TAILBITING  CODES 

Theorem  1  The  state  complexity  p  of  a  linear  ( N ,  K,  dm in) 
tailbiting  code  is  lower-bounded  by 


p  >  po  = 


max  {RNm\n(j,d  min)  j }■ 

;'= i . k 


Moreover,  if  po  is  odd  p  >  max{po,  dmm(K  +  I)/N  —  1}. 


encoders  of  convolutional  codes.  The  second  one  rejects  those 
encoders  among  the  accepted  ones  which  generate  poor  tailbit¬ 
ing  codes.  Some  search  results  are  presented  in  the  following 
table. 


N ,  K ,  drnin(^min) 

/•*(£) 

Generators 

56,28,12(12-14) 

9(8) 

477,1505 

58,29,12(12-14) 

9(8) 

433,1275 

60,30,12(12-14) 

9(8) 

217,1665 

62,31,12(12-15) 

8(8) 

435,657 

64,32,12(12-16) 

8(8) 

235,557 

66,33,12(12-16) 

8(8) 

235,557 

68,34,13(13-16) 

11(9) 

4315,5651 

72,36,14(15-18) 

13(10) 

4473,32611 

74,37,14(14-18) 

11(10) 

1353,7461 

76,38,14(14-18) 

11(10) 

1145,7173 

78,39,14(15-18) 

10(10) 

82,41,14(14-20) 

10(10) 

1157,3455 

84,42,14(15-20) 

10(10) 

1157,3455 

92,46,16(15-22) 

13(11) 

5447,21675 

94,47,16(16-22) 

12(11) 

5135,14477 

96,48,16(16-22) 

12(11) 

5135,14477 

110,55,18(18-25) 

15(14) 

23077,173255 

84,28,22(22-27) 

11(10) 

2215,5467,7647 

96,32,24(24-30) 

12(11) 

2153,11625,17557 

99,33,24(24-32) 

11(11) 

4467,5725,6373 

102,34,24(24-32) 

11(11) 

4465,5357,6373 

105,35,25(24-33) 

13(13) 

20447,25315,37317 

108,36,26(24-34) 

13(13) 

20465,31327,34773 

111,37,26(25-34) 

13(13) 

20445,31527,35757 

114,38,26(26-36) 

13(13) 

20445,31653,37673 

120,40,28(28-37) 

14(14) 

41127,63663,72575 

112,28,32(32-40) 

11(11) 

4447,5277,6335,7533 

116,29,32(32-42) 

11(11) 

4445,6353,6537,7673 

Denote  by  £  =  p/N  the  relative  trellis  complexity.  Then 
we  have  the  following  asymptotic  behavior  of  £  as  N  — »  oo, 

C>  max  {0[.R-.Rmax(<V0)]}, 

0€[2<5,1] 

where  S  =  dm-,„/n,  and  Rm ax(-)  is  the  McEliece-Rodemich- 
Rumsey- Welch  upper  bound. 

III.  Search  techniques  and  results 

We  have  used  the  bound  in  Theorem  1  to  find  an  efficient 
(in  sense  of  state  complexity)  tailbiting  representation  for  an 
( N ,  K)  linear  block  code  using  time-invariant  convolutional 
codes  of  rate  R  =  1/c,  c  =  2,3,4,  and  state  complexity  (con¬ 
straint  length)  p.  We  exploit  two  kinds  of  methods  to  reject 
weak  codes.  The  first  one  includes  rules  for  rejecting  weak 


Almost  all  codes  meet  the  Brower- Verhoeff  (BV)  lower  bound 
dmin  on  the  minimum  distance  for  linear  codes  and  achieve  the 
lower  bound  p  on  the  state  complexity.  All  presented  codes 
are  new  best  known  quasi-cyclic  codes.  The  code  (111,37,26) 
is  better  than  any  previously  known  linear  code  with  the  same 
length  and  dimension,  and  the  codes  (92,46,16),  (105,35,25) 
and  (108,36,26)  are  better  than  any  previously  known  codes 
with  the  same  length  and  dimension. 
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I.  Introduction 

A  tail  biting  (TB)  trellis  for  a  code  is  a  trellis  with  multi¬ 
ple  starting  and  multiple  ending  states  [1,  2],  Each  starting 
state  corresponds  to  a  unique  ending  state  and  they  are  the 
same  state.  A  path  in  a  TB  trellis  represents  a  valid  code¬ 
word  if  and  only  if  it  starts'from  a  state  and  ends  at  the  same 
state.  Such  a  path  is  called  a  TB  path.  In  this  paper,  two  new 
iterative  algorithms  for  decoding  codes  based  on  their  TB  trel¬ 
lises  are  presented,  one  is  unidirectional  and  the  other  is  bidi¬ 
rectional.  Both  algorithms  are  computationally  efficient  and 
achieves  virtually  optimum  error  performance  with  a  small 
number  of  decoding  iterations. 

II.  The  Wrap-Around  Viterbi  (WA-V) 
Algorithm 

Let  T  be  an  L-section  TB  trellis  for  a  code  C  with  sec¬ 
tion  boundary  locations  in  {0,1,  ••■,£}.  For  0  <  t  <  L,  let 
Et(C)  =  {s['\ •  ■  • ,  ^ }  denote  the  state  space  of  the 

trellis  at  the  boundary  location(BL)-/.  Eo(C)  and  Ez.(C) 
are  the  starting  and  ending  state  spaces,  respectively,  and 
Eo(C)  =  Ei(C).  For  simplicity,  we  assume  that  and 
are  the  same  state  for  1  <  i  <  qo(or  qt). 

The  WA-V  algorithm  processes  T  continuously  in  a  round- 
and-round  manner.  One  round  of  decoding  process  is  called  an 
iteration.  At  the  beginning  of  the  first  iteration,  the  decoder 
starts  from  all  the  states  in  Eo(C)  at  BL-0  with  the  same  ini¬ 
tial  state  metrics.  At  the  end  of  each  iteration,  the  decoder 
attempts  to  make  a  decoding  decision.  If  the  decoding  deci¬ 
sion  is  successful,  the  decoding  process  stops;  otherwise,  the 
decoder  wraps  around  T  and  starts  another  iteration  from  the 
states  in  E o(C')  with  the  starting  state  metrices  equal  to  the 
metrics  of  their  equivalent  states  in  E l(C)  at  the  end  of  the 
previous  iteration.  Suppose  the  decoder  is  executing  the  i-th 
iteration.  For  each  state  s\3^  in  T,  the  decoder  computes  two 
metrics,  the  accumulative  state  metric(ASM)  and  the  state 
metric  gain(SMG).  The  ASM  of  a  state  at  BL-/  in  the  i- 
th  iteration,  denoted  M('\t,  ),  is  defined  as  the  total  path 

metric  of  the  survivor  that  originates  from  a  state  at  the 
beginning  of  the  first  iteration  and  terminates  at  the  state 
St  .  The  SMG  of  the  state  *  during  the  i-th  iteration,  de¬ 
noted  A ^'\t,  s[3^),  is  defined  as  the  path  metric  of  the  survivor 
p(')(so  \  s[3^)  that  originates  from  the  state  at  BL-0  at  the 
beginning  of  the  i-th  iteration  and  terminates  at  the  state  s[3\ 
Therefore,  s[3^)  =  s[3^)  —  M*‘^(0,  s^),  where 

M^(0,Sp^)  =  s^P)  and  M^(0,  s[,^)=constant,  for 

all  1  <  k  <  qo-  When  the  decoder  reaches  BL-L  at  the 
end  of  the  j-th  iteration,  there  are  qi  survivors  P 
of  length  L,  each  terminates  at  a  different  state  s ^  at 
BL-L.  The  path  p^(sg  °\s^0^)  with  the  best  metric  gain 
A  {t'>(L,silo))  is  chosen  as  the  winning  path.  If  the  win¬ 
ning  path  P is  a  TB  path,  decoding  stops  and 


p(,)(4'0)lS(/0))  is  the  decoded  codeword.  Otherwise,  find  the 
TB  path  with  the  best  metric  (if  any),  denoted  p^\e.t,  and 
store  it.  The  decoder  then  starts  the  (i  - f  1  )-t,h  iteration.  De¬ 
coding  process  continues  until  either  the  winning  path  at  the 
end  of  an  iteration  is  a  TB  path  or  a  preset  maximum  number 
of  iterations  I  max  is  reached.  For  the  latter  case,  the  decoder 
outputs  the  best  TB  path  pr, best  stored  in  the  memory  if  it 
exists;  otherwise,  it  outputs  the  winning  path  found  at  the 
end  of  the  I  max- th  iteration. 

III.  The  Iterative  Bidirectional  Viterbi 
Decoding  (IBVD)  Algorithm 

This  algorithm  processes  a  TB  trellis  T  of  a  code  from 
opposite  directions  with  two  decoders,  called  the  left-  and  the 
right-decoder,  respectively.  Both  decoders  execute  the  WA-V 
algorithm  and  they  collaborate  to  make  a  decoding  decision. 
During  each  iteration,  the  two  decoders  start  from  opposite 
ends  of  T,  work  through  the  trellis  until  they  reach  the  other 
ends  of  T.  For  each  state  in  the  trellis,  two  ASM’s  and  two 
SMG’s  are  computed  by  the  two  decoders.  At  iteration-!,  as 
soon  as  a  state  has  been  visited  by  both  decoders,  it  has 
two  SMG’s,  denoted  A|'^(<,s^)  and  A^(l,s|^),  from  the 
left  and  the  right  decoders.  The  sum  of  these  two  SMG’s, 
denoted  A['\t,  s[3^)  is  called  the  composite  SMG  of  the  state 
s[3^  which  is  simply  the  path  metric  of  the  survivor  of  length 
L  passing  through  the  state  s(tJ*  that  connects  a  state  at  BL- 
0  with  a  state  at  BL -L.  This  survivor  is  called  a  composite 
path(CP).  The  CP  at  BL-/  with  the  best  composite  SMG  is 
called  the  best  CP  at  BL-/,  denoted  BCPt.  If  the  BCPt  is  a 
TB  path,  decoding  stops;  otherwise,  find  the  best  composite 
TB  path  at  BL-/(if  any),  denoted  BCTBPt,  and  store  it  in 
the  decoder  memory.  The  iteration  process  continues  until 
the  BCP  at  a  boundary  location  is  found  to  be  a  TB  path  or 
a  preset  maximum  number  of  iterations,  I  max,  is  reached.  For 
the  latter  case,  the  updated  BCTBP  in  the  memory  is  chosen 
as  the  decoded  codeword  if  it  exists;  otherwise,  the  BCP  stored 
in  the  memory  is  chosen  as  the  decoded  codeword. 

IV.  Performance  and  Complexity 
Both  WA-V  and  IBVD  algorithms  have  been  applied  for 
decoding  several  convolutional  and  block  codes(including  the 
(24,12)  Golay  code).  Simulation  results  show  that  both  al¬ 
gorithms  achieve  virtually  optimum  error  performance  with  a 
small  number  of  iterations.  The  IBVD  algorithm  in  all  cases 
always  converges  to  MLD  performance  in  two  iterations. 
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Abstract  —  It  is  shown  that  for  short  and  moderate 
relative  tailbiting  lengths  and  high  signal-to-noise  ra¬ 
tios  systematic  feedback  encoders  have  better  bit  er¬ 
ror  performance  than  nonsystematic  feedforward  en¬ 
coders.  Conditions  for  when  tailbiting  will  fail  are 
given  and  it  is  described  how  the  encoder  starting 
state  can  be  obtained  for  feedback  encoders  in  both 
controller  and  observer  canonical  form. 

I.  Systematic  versus  Nonsystematic  Tailbiting 
Encoders 

Comparing  the  bit  error  performance  between  tailbiting  codes 
encoded  by  systematic  and  nonsystematic  encoders  [1]  shows 
that  for  a  bad  channel  systematic  encoders,  feedforward  or 
feedback,  give  the  best  performance.  Simulations  also  show 
that  the  best  encoders  to  use  when  the  channel  quality  is  un¬ 
known  are  the  systematic  feedback  ones.  In  a  good  chan¬ 
nel  we  show  that  the  type  of  encoder  having  the  best  bit 
error  performance  depends  on  the  relative  tailbiting  length, 
i.e.,  the  tailbiting  length/memory.  For  a  good  channel,  ML- 
decoding,  and  a  rate  R  =  b/c  tailbiting  code  of  length  L  an 
upper  bound  on  the  bit  error  probability  can  be  expressed  as 
Pb  <  j;  Yld=d  bdPd,  where  bd  is  the  sum  of  all  bit  errors  for 
all  codewords  of  weight  d  and  Pd  is  the  probability  that  a  word 
of  weight  d  is  chosen  instead  of  the  allzero  word.  For  a  given 
length  L  and  memory  m  the  encoder  giving  the  lowest  bit  error 
probability  in  a  good  channel  is  the  one  with  as  large  mini¬ 
mum  distance  as  possible  and  the  smallest  bdmin  as  possible. 
For  rate  R  =  1/2  a  search  has  been  made  for  these  encoders 
at  various  lengths  and  encoder  memories.  We  can  identify 
three  regions  where  different  encoder  types  give  the  best  per¬ 
formance.  For  very  short  relative  tailbiting  lengths  the  best 
feedforward  encoders  are  systematic  and  give  the  same  bit  er¬ 
ror  probability  as  the  best  systematic  feedback  encoders.  For 
short  and  medium  relative  tailbiting  lengths,  systematic  feed¬ 
back  encoders  are  typically  a  factor  of  1.5-2  better  than  the 
feedforward  ones.  For  long  relative  tailbiting  lengths  feedfor¬ 
ward  encoders  give  typically  a  factor  of  2  better  performance 
than  the  systematic  feedback  encoders.  The  explanation  for 
this  lies  in  the  type  of  codeword  which  leads  to  the  minimum 
distance.  We  show  that  this  in  turn  depends  on  the  relative 
tailbiting  length. 

II.  Tailbiting  failure 

A  rate  R  =  b/c  feedback  convolutional  encoder  of  memory  m 
can  be  viewed  as  consisting  of  b  linear  feedback  shift  registers 
(LFSRs),  where  the  longest  shift  register  has  length  m.  For  a 

1This  research  was  supported  by  the  Foundation  for  Strategic 
Research  -  Personal  Computing  and  Communication  under  Grant 
PCC-9706-09. 


given  LFSR  we  define  the  cycle  characteristic  of  the  LFSR  as 
the  set  of  all  possible  cycles  of  its  output.  Consider  first  a  rate 
R=l/c  encoder.  Assume  that  the  LFSR  has  a  cycle  of  length 
p.  Then  if  we  are  in  one  of  the  states  that  belongs  to  this  cycle 
and  feed  the  encoder  with  only  zeros  at  the  input,  correspond¬ 
ing  to  an  allzero  information  sequence,  the  encoder  returns  to 
the  same  state  after  p  steps.  If  the  tailbiting  length  (number 
of  trellis  sections)  L  is  a  multiple  of  p,  then  we  have  more  than 
one  codeword  corresponding  to  an  all-zero  input  since  the  al¬ 
lzero  codeword  corresponds  also  to  the  allzero  input.  This 
means  that  for  this  L,  we  have  no  one-to-one  mapping  be¬ 
tween  the  blocks  of  information  bits  and  the  codewords,  and 
the  tailbiting  technique  cannot  work.  Every  polynomial  has 
at  least  one  cycle  of  length  1,  the  zero  cycle  corresponding  to 
the  allzero  codeword,  which  is  not  a  trouble  maker,  but  for 
any  multiple  of  any  other  cycle,  the  tailbiting  technique  fails. 
If  we  have  a  general  rate  R  =  b/c  encoder  the  tailbiting  tech¬ 
nique  does  not  work  for  any  multiple  of  the  cycles  in  the  cycle 
characteristic  of  any  of  the  b  LFSRs.  See  also  [2]  [3]  [4]. 

III.  Finding  the  Encoder  Starting  State 

For  polynomial  convolutional  encoders  realized  in  controller 
canonical  form  the  initial  state  of  the  encoder  is  simply  given 
by  the  reciprocal  of  the  last  m  input  5-tuples,  but  for  system¬ 
atic  feedback  encoders  the  starting  state  depends  on  all  of  the 
information  bits  to  be  encoded.  Several  methods  are  presently 
known  for  finding  the  starting  state  in  the  controller  canonical 
form.  Certain  algebraic  equations  may  be  set  up  and  solved 
to  obtain  the  starting  state  [2]  [4].  In  some  cases  the  number 
of  delay  elements  can  be  reduced  by  realizing  the  encoder  in 
observer  canonical  form.  For  example,  the  minimal  realization 
of  rate  R  =  2/3  and  R  =  3/4  systematic  feedback  encoders  is 
the  observer  canonical  form  [5] .  We  give  a  method  for  finding 
the  starting  state  for  this  form. 
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Abstract  —  We  study  the  problem  of  finding  zero- 
error  instantaneous  codes  for  the  Slepian-Wolf  con¬ 
figuration  [1],  By  a  zero-error  instantaneous  code 
we  mean  two  encoder  maps  f\  :  X  — >  {0,1}*,  f2  : 

Y  — t  {0, 1}*  and  a  decoding  algorithm  which,  for 
any  pair  of  encoder  outputs  /i(xi)/i  (x2)/i(x3) . . .  and 
h(yi)h{y2)h{y3)  ■  •  ■  >  can  correctly  determine  Xi  and  y i 
by  reading  only  the  first  length}  R  (a;  i))  bits  of  the  X 
encoder  output  and  the  first  length(/2(yi))  bits  of  the 

Y  encoder  output.  For  \X\  =  2,  we  find  a  necessary  and 
sufficient  condition  for  the  existence  of  a  zero-error  in¬ 
stantaneous  code  with  a  given  set  of  codeword  lengths 
for  Y .  Using  this  condition,  we  derive  an  upper  bound 
to  the  minimum  expected  codeword  length  for  Y  and 
construct  a  simple  example  showing  that  the  coding 
scheme  proposed  by  Kh,  Jabri  and  Al-Issa  [2]  is  not 
optimal  in  general,  and  more  surprisingly,  the  opti¬ 
mal  code  may  violate  the  “Morse  condition”  that  the 
more  probable  of  two  symbols  never  has  the  longer 
codeword.  For  \X\  =  3,  we  find  a  necessary  but  not 
sufficient  condition  for  the  existence  of  a  zero-error  in¬ 
stantaneous  code  with  a  given  set  of  codeword  lengths 
for  Y.  Moreover,  for  \X\  >  3,  the  existence  of  such  a 
code  is  shown  to  be  related  to  a  rectangle  packing 
problem. 

I.  Introduction 

The  variable-length  coding  scheme  proposed  by  Kh,  Jabri  and 
Al-Issa  [2],  henceforth  called  KJA  coding,  is  summarized  as 
follows.  One  of  the  sources,  say  X,  is  encoded  by  a  Huffman 
code  corresponding  to  the  marginal  p.m.f.  p(x),  while  for  Y , 
a  Huffman  code  is  constructed  for  0(F)  rather  than  Y,  where 
0  :  {0,1,2,...  ,|Y|  -1}  -*  {0,1,2,...  ,5-1}  and  5  <  |Y|- 
5  and  0  are  chosen  in  such  a  way  that  4>{yi)  0(y2)  if  3x' 

s.t.  p{x',yi)  >  0  and  p{x' ,y2)  >  0,  and  the  entropy  of  4>{Y) 
attains  the  minimum  over  all  possible  choices  of  0. 

KJA  codes  operate  at  rates  Rx  —>  H(X)  and  Ry  <  R(F) 
and  in  general  have  lower  rates  than  Witsenhausen’s  codes  [3] 
because  they  use  the  nonzero  p(x ,  y)  values  explicitly,  whereas 
Witsenhausen  distinguished  only  those  (x,y)  with  p(x,y)  =  0 
from  those  for  which  p{x,y)  >  0.  However,  we  notice  that  it 
is  not  necessary  for  the  distinct  codewords  for  Y  to  satisfy  the 
prefix  condition.  Therefore,  we  propose  an  improved  coding 
scheme  as  follows. 

For  X,  a  Huffman  code  is  used  as  before.  For  Y,  we 
abandon  the  mapping  0  and  encode  by  /2  directly,  where 
/2  has  the  property  that  for  each  x  G  X  the  set  of  code¬ 
words  {/2(p)  '■  p(x,y)  >  0}  satisfies  the  prefix  condition;  we 
call  such  a  code  an  admissible  / 2 .  Decoding  is  done  in  two 
steps.  X  is  first  decoded  in  the  usual  way  for  Huffman  codes. 


Then  the  decoded  value  x  will  give  us  a  set  of  codewords 
{/2(p)  '■  p(x,y)  >  0}  which  can  be  used  to  decode  Y.  The  fact 
that  {/2(y)  :  p(x,y)  >  0}  satisfies  the  prefix  condition  guaran¬ 
tees  that  as  we  read  the  encoder  output  of  Y  sequentially,  the 
first  match  to  one  of  the  codewords  in  {f2 (y)  :  p(x,y)  >  0} 
corresponds  to  the  true  value  of  Y.  Therefore,  this  modified 
scheme  is  a  zero-error  instantaneous  code. 

II.  Main  Results 

Where  i  G  X  and  j  G  y ,  let  A;  =  [y  :  p(i,y )  >  0}  and 
h  =  length(/2(j)). 

Theorem  1:  For  \X\  =  2,  if  /2  is  admissible,  then 

2— <  1,  for  i  =  0,l.  Conversely,  if  {/}}  satisfies 

52j£A  ■  2~  '  <  1,  for  i  =  0,1,  then  there  exists  an  admissible 
f2  such  that  lj  —  length  (/2(y))  for  all  j  G  Y- 

Non-optimality  of  KJA  coding:  For  the  joint  p.m.f. 
{p(0,0)  =  0.201,  p(0,l)  =  0.201,  p(0, 2)  =  0.201,  p(0,  3)  = 
0.1,  p(l,3)  =  0.1,  p(  1 , 4)  =  0.197},  the  expected  codeword 
length  of  Y  for  KJA  coding  is  2,  while  that  for  our  code  is 
1.803. 

Theorem  2:  For  \X\  —  3,  if  / 2  is  admissible,  then 

]T  2~';  <  1  fori  =  0,1, 2  (1) 

jeAi 

and  2~‘j  ^  1  (2) 

je(A0n/41)u(AinA2)u(42n/ro) 

Unlike  the  case  for  \X\  =  2,  (1)  is  not  a  sufficient  condi¬ 
tion  for  the  admissibility  of  /2  when  \X\  =  3.  Furthermore, 
even  with  the  additional  constraint  (2),  we  still  do  not  have 
a  sufficient  condition.  This  can  be  verified  by  the  following 
example. 

If  \X\  =  -3,  |Y|  =  7,  A0  =  {0,2, 4, 6},  Ai  =  {1,2, 5, 6}, 
A2  =  {3,4,  5,6},  then  (2)  and  (3)  can  be  satisfied  by  choosing 
{(,}  s.t.  lo  —  h  =  I3  =  2,  l2  —  h  =  h  —  3  and  le  —  1.  Yet  no 
codeword  sets  with  these  lengths  can  satisfy  the  admissibility 
requirement.  We  will  show  this  by  considering  a  rectangle 
packing  problem. 
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Abstract-  In  this  paper,  we  propose  a  new  match  length 
function  (MLF)  called  multi-value  MLF  (MVMLF)  to  be 
used  with  Lempel-  Ziv  type  (LZI)  data  compression 
scheme.  By  restricting  the  function  to  five  essential 
constraints,  we  obtain  the  most  complete  and  compact 
dynamic  dictionary  which  is  efficiently  updated.  Based 
on  MVMLF,  we  present  two  asymptotically  optimal 
compression  schemes. 


Summary 

With  reference  to  Lempel-  Ziv  data  compression  algorithm 
(LZ1)[1],  various  modifications  have  been  made  to  reduce 
the  bits  required  to  encode  the  match  length  and  match 
position.  In  [2],  Gavish  and  lempel  describe  match  length 
function  (MLF),  and  propose  to  use  MLF  to  save  on 
encoding  the  length  of  a  match.  As  a  modification  of  their 
approach,  in  this  paper  we  introduce  and  employ  multi-value 
MLF  (MVMLF)  to  achieve  some  desired  properties,  such  as 
to  obtain  the  most  complete  and  compact  dynamic  dictionary 
which  is  efficiently  modified  and  updated  according  to  the 
existance  redundancy  in  the  window. 

The  MLF  as  introduced  in  [2]  associates  just  one  unique 
nonnegative  integer  called  match  length  value  to  each 
position  of  the  current  input  data  string  in  the  window.  MLF 
is  defined  as  follows. 

Let  X*~x  =  x0x,x2...xAr_1  denote  an  input  sequence  of 
length  N  over  a  finite  alphabet  of  size  a  and  n ,  0  <  n  «  N , 
be  the  size  of  the  sliding  window.  Let  also  /,(A)  denote  the 
value  of  MLF  for  the  A *  position  of  the  window  ,  when  the 
window  starts  with  x, ,  the  /  *  symbol  of  the  input  string.  To 
determine  l,{k) ,  we  look  for  the  largest  integer  /',/'<  LMAX, 

that  satisfies  X'*kk+l'~l  =  Xl'+"+,‘(m)~2  0  <m<k  ,  and  then 

we  set  /((£)=  max  {/' ,  LMIN } ,  where  LMIN  and  LMAX 
indicate  the  minimum  and  maximum  permissible  values  of 
MLF.  All  strings  of  the  form  X'^f1,  (*H  ,  0  <  k  <  n  are 
referred  to  as  valid  strings  ,  and  the  set  of  all  valid  strings 
creates  the  dictionary. 

The  MVMLF  that  we  propose  in  this  paper  may  uniquely 
associate  several  match  length  values  to  some  positions  in 
the  string  ,  but  does  not  associate  any  match  length  value  to 
other  positions.  That  is  ,  numerous  valid  strings  can  be 
started  from  some  positions  (called  valid  positions)  , 
whereas  no  valid  string  is  started  from  other  positions  (called 
invalid  positions).  To  save  the  required  memory ,  and  also  to 
implement  the  recursive  algorithm  for  MVMLF  evaluation 
easily,  we  consider  a  special  from  of  MVMLF  in  which  the 
values  associated  to  a  valid  position  are  successive.  We 
define  three  functions  /,(£), /„,!„,  (k)and /raax (A) ,  each 

associates  a  value  to  each  position’/: ,  for  0  <  k  <  n ,  where 
f  (A)  =  0  implies  invalidity  of  position  A:  and  /,  (A)  =  1 
implies  validity  of  all  strings  X*+M ,/  =  /mi„(  (A),..., /max  (A) 

started  from  position  A  .  Subscript  i  in  these  functions  has 
the  same  meaning  as  in  /,(A). 


To  have  the  most  complete  set  of  valid  strings  with 
minimum  redundancy,  in  which  lengthy  strings  can  be 
included  as  easily  as  short  ones  ,  and  also  to  update  the 
MVMLF  efficiently  ,  we  force  MVMLF  to  satisfy  the 
following  constraints : 

•  All  prefixes  of  a  valid  string  are  valid. 

•  Each  string  with  the  occurrence  of  at  least  two  times  in 
the  window  is  a  valid  string. 

•  Validity  of  a  string  is  preserved  as  long  as  the  string 
remains  in  the  window. 

•  A  string  can  be  valid  in  just  one  position  in  the  window. 

•  A  string  appearing  in  several  positions  in  the  window 
must  be  valid  in  the  Tightest  position  of  its  occurrence. 

Assume  that  the  subsequence  X'j+n~]  is  in  the  window  and 
MVMLF  has  been  evaluated  for  all  position  at  the  left  side 
of  A.  To  evaluate  MVMLF  at  position  A ,  we  first  set 

fi  (*)  =  1  >  Lin,  (*)  =  ^maXj  (*)  =  LMIN  •  We  then  d° 
the  following  steps  sequentially  for  k'  =  k-1 ,  k-2 ,  ...,  2,1,0. 


1-  Specify  /,  the  length  of  match  between  positions  A  and  A' 
as : 


/  =  min  {max,.  {/';  X^  =  X^  \  LMAX } 


2- 


f\ 


LMIN  <  l 
L^k')<l 


,  Set 


M')  =  o 

L«,(k')  =  l 


or 


if 


m')= i 


(A')  ,  Set 


The  MVMLF  defined  in  this  way  satisfies  all  five  constraints 
mentioned  above.  We  propose  two  data  compression 
schemes  based  on  using  MVMLF.  We  prove  that  under  the 
conditions  : 


lim  'SilMNNl  =  0  ,  LMAXW  >\ °^-.and 


LMIN  < 


log  n 

log  n  +  log  LMAX  + 1 
log  cc 


H+£ 
<  LMIN  + 1 


the  proposed  schemes  are  asymptotically  optimal. 
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Abstract  —  This  paper  treats  the  codeword  length  of 
a  fixed-to- variable  length  code  (FV  code)  as  a  random 
variable  and  analyzes  its  asymptotic  properties.  It  is 
shown  that  for  a  given  general  source  [2]  the  codeword 
length  can  be  viewed  as  the  self  information  as  n  — >  oo 
if  a  certain  kind  of  optimal  lossless  FV  code  is  used. 

I.  Introduction 

Consider  an  FV  code  that  encodes  a  discrete  random  vari¬ 
able  Xn  S  Xn  into  K- ary  codewords  with  K  >  2.  We  de¬ 
fine  the  FV  code  as  a  pair  of  an  encoder  ipn  and  a  decode 
V’ 7i,  where  <p„  is  a  surjective  mapping  from  J"  to  a  code 
Cn  C  {0,1, ...  ,K  —  1}*  and  ipn  a  mapping  from  Cn  to  Xn . 
Usually,  performance  of  FV  codes  is  measured  by  the  aver¬ 
age  codeword  length  E\l{q>n  (Xn))]  under  the  requirement  that 
the  decoding  error  probability  e„  =  Pr{V’n(v’n(Y’n))  yt  Y"}is 
equal  to  zero,  where  E[  •  ]  denotes  the  expectation  with  respect 
to  the  probability  distribution  Px »  of  Xn . 

In  this  paper  we  investigate  asymptotic  properties  of  a  ran¬ 
dom  variable  l(<pn(Xn))  as  n  — 1  oo  for  the  cases  that  en  =  0 
for  all  n  >  1  and  limsupn_+OCJ  £„  <  e  for  an  arbitrary  e  e  [0, 1). 
In  both  cases  l(ipn(Xn))  turns  out  to  be  deeply  related  to  an¬ 
other  random  variable  \ogK  -p— Provided  that  FV  codes 
satisfying  a  certain  kind  of  optimality  are  used. 

II.  Lossless  Case 

Suppose  that  X  =  is  an  infinite  sequence  of  ran¬ 

dom  variables  (or  the  general  source  [2])  with  a  countably 
infinite  alphabet  X.  In  order  to  unveil  a  relationship  between 
the  two  random  variables,  we  consider  the  following  class  of 
FV  codes  for  X  originally  defined  in  [5], 


[3],  then  for  any  5  >  0  all  sequences  of  asymptotically  mean- 
optimal  FV  codes  {(<£„,  V’n)}^!  satisfy 

lira  Pr{Xn  €  Wn(<5)}  =  0, 

n  — *  o © 

where  W„(J)  is  defined  as 

W.(S)  =  {*■  €  *■  :  |llofe  >  j}. 


III.  e-ERROR  Case 

Next,  we  consider  infinite  sequences  of  prefix-free  FV  codes 
satisfying  lim  sup^^  en  <  e  for  an  arbitrarily  fixed  e  €  [0, 1). 
In  this  setting  we  characterize  an  asymptotic  behavior  of  code¬ 
word  length  of  FV  codes  belonging  to  the  following  class. 


Definition  2  An  infinite  sequence  of  FV  codes  {(<pn,ipn)}^Lj 
is  called  asymptotically  mean-£-optimal  if  it  satisfies  all  of 

(El)  ^  K <  1  f°r  al1  n  >  L 

V*€C„ 

(E2)  lim  lim  sup  {—E[l(<pn(Xn))\  —  — Gf+-,(Xn)l  <0, 
iio  U  n  ) 

(E3)  lim  sup  <  e, 

Tl  — k  OO 

where  Ge(Xn)  is  defined  as 


G'(Xn) 


inf 

A„:Pr{A"£An}>l- 


Y  Px-(xn)\og2 

ingA„ 


Pr{Xn  e  An} 
Px"(xn) 


The  mean-e-optimal  FV  code  can  be  easily  constructed  in  the 
manner  similar  to  the  construction  of  the  weak  variable  length 
code  [3,  Sect.  1.8].  We  have  the  following  theorem  on  the  class 
of  asymptotically  mean-e-optimal  codes. 


Definition  1  ,4n  infinite  sequence  of  FV  codes  {(^>n,  )}^L1 

is  called  asymptotically  mean-optimal  if  it  satisfies  all  of 

(LI)  Y  K-'Wn{xn))  <  1  for  all  n  >  1, 

xn£Xn 

(L2)  limsup  ( -E[l{q>n(Xn))}  -  ii7(Xn)l  <  0, 

n— koo  ^  71  J 

(L3)  £n  —  0  for  all  n  >  1. 


It  is  easy  to  check  the  Shannon-Fano- Elias  code  (e.g.,  [1])  is 
asymptotically  mean-optimal  if  it  is  applied  to  Xn ,  n  >  1.  We 
have  the  following  theorem  on  asymptotically  mean-optimal 
FV  codes  that  is  comparable  with  Nemetz  and  Simmons  [4] 
treating  discrete  memoryless  sources  with  a  finite  alphabet. 


Theorem  1  J/{L  logJf  uniformly  integrable, 

i.e.,  it  satisfies 


lim  sup  — 

71  >  1  U 


E 


Px"(xn)  log/c 


1 


xn,‘  ~  log 


Pxn(xn) 


=  0 


K  Px n ( * " )  : 


Theorem  2  For  any  &  >  0  all  sequences  of  asymptotically 
mean-e-optimal  FV  codes  {(q>n,xfn)}^Ll  satisfy 

lim  Pr{Xn  €  Vn  fl  W„(J)}  =  0, 

71  — k  OO 

where  Dn  is  defined  as  T>n  =  { xn  €  Xn  :  ipn(q>n(xn))  =  xn}. 
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Abstract  —  We  introduce  the  idea  of  proper  and 
almost  surely  complete  parsing.  This  parsing  can 
uniquely  segment  the  source  output  with  probability 
one,  and  strengthens  the  coding  converse  theorem. 
Some  kinds  of  non-proper  parsing  are  involved  in  the 
proper  and  almost  surely  complete  parsing. 

I.  Introduction 

So  far,  a  variable-to- variable  (VV)  length  encoder  has  been 
considered  as  a  variable-to-fixed  (VF)  length  encoder  followed 
by  a  fixed- to-variable  (FV)  length  encoder  [1],  Let  us  call 
this  type  of  VV  length  encoders  as  VF-FV  length  encoders. 
The  VF  length  encoder  has  a  parser  as  a  preprocessor,  which 
segments  the  source  output  into  a  concatenation  of  variable- 
length  strings,  each  of  which  belongs  to  a  dictionary.  The 
dictionary  is  supposed  to  be  proper ,  i.e. ,  no  string  in  the  dic¬ 
tionary  is  a  prefix  of  another  string  in  the  dictionary.  More¬ 
over,  the  dictionary  is  supposed  to  be  complete,  i.e.,  every 
infinite  sequence  has  a  prefix  in  the  dictionary.  Generally,  a 
prefix  and  complete  dictionary  has  exactly  one  prefix  for  every 
infinite  sequence. 

We  consider  an  i.i.d.  source  with  a  countable  alphabet  X 
with  distribution  P.  If  the  dictionary  of  the  parser  is  proper 
and  complete,  then  the  lengths  of  its  entries  are  uniformly 
bounded.  However,  it  is  not  necessary  to  bound  the  lengths 
of  the  entries  for  a  VV  length  encoder. 

II.  Almost  Surely  Completeness 

Definition  1  A  dictionary,  which  is  a  set  of  finite  length 
sequences  over  X ,  is  said  to  be  almost  surely  complete  (a.s.c.) 
if  the  probability  that  the  dictionary  has  a  prefix  of  the  suffi¬ 
ciently  long  source  output  is  one. 

For  a  proper  and  a.s.c.  dictionary,  the  lengths  of  its  en¬ 
tries  are  not  generally  uniformly  bounded  and  the  dictionary 
has  exactly  one  prefix  of  a  sufficiently  long  source  output  as 
its  entry  with  probability  one.  The  “weakly  unique  parsable- 
ness”  for  VF  length  coding  defined  in  [2]  is  equivalent  to  this 
property.  Note  that  being  complete  means  being  a.s.c.  An 
example  for  a  proper,  a.s.c.  and  non-complete  dictionary  over 
{0, 1}  is  {0, 10, 110,  1110,  •  •  •},  which  has  no  prefix  of  111  •  •  ■. 

With  an  a.s.c.  dictionary,  we  reinterpret  a  VV  length  en¬ 
coder.  A  VV  length  encoder  consists  of  a  parser  and  a  prefix 
encoder  i p.  The  parser  have  a  proper  and  a.s.c.  dictionary  ip. 
The  prefix  encoder  ip  emits  a  codeword  tp(x)  for  each  string 
x  in  ip.  Let  (ip,  tp)  denote  a  VV  length  encoder.  Note  that 
the  VV  length  encoder  can  no  longer  be  decomposed  into  a 
VF  length  encoder  and  an  FV  length  encoder.  We  now  must 
abandon  that  every  infinite  sequence  out  of  the  source  can  be 
encoded.  But  they  can  be  still  encoded  with  probability  one. 
Theorem  1  (Coding  Theorem)  We  define  the  cod¬ 
ing  rate  of  a  VV  length  encoder  (ip, ip)  as  E|<£>|/E|t/>|  = 
(Lxei,P(x) IvO5)!)  /  (12x€^P(x)\x\)i  where  M  denotes  the 
length  of  a  string  x.  Let  C  denote  the  collection  of  VV  length 
encoders  with  a  proper  and  a.s.c.  dictionary.  We  have 

inf  E\ip\/E\iP\  =  H(P), 

(V>,V>)€C 

where  H(P)  —  -  J2aex  ^*(a)  1°6  p(a)  and  the  base  of  the  log¬ 
arithm  is  equal  to  the  size  of  the  code  alphabet. 


The  theorem  consists  of  the  direct  part  and  the  converse 
part.  The  direct  part  can  be  replaced  by  that  of  the  FV  length 
coding  because  an  FV  length  encoder  is  also  a  VV  length 
encoder.  The  converse  part  can  be  demonstrated  similarly  to 
the  VF-FV  length  coding  theorem  by  means  of 
Lemma  1  For  an  i.i.d.  source  over  a  countable  alpha¬ 
bet  X  with  distribution  P,  if  ip  is  proper  and  a.s.c.,  then 
-E^-PWlogi3W  =  E|^|H(P). 

The  proof  for  a  finite  alphabet  can  be  found  in  [3]. 

III.  The  Longest  Matching 
The  properness  and  almost  surely  completeness  together 
guarantee  that  the  source  output  is  uniquely  segmented  with 
probability  one.  However,  the  properness  is  not  a  necessary 
condition.  We  allow  the  dictionary  not  to  be  proper,  and  let 
the  parser  cut  the  source  output  at  the  tail  of  the  longest 
match  among  the  dictionary.  Then,  the  necessary  and  suffi¬ 
cient  condition  for  uniquely  parsing  is.  the  following. 
Definition  2  We  say  that  a  dictionary  ip  is  longest  matchable 
if  in  ip,  there  exists  the  longest  prefix  of  a  sufficiently  long 
source  output  with  probability  one. 

A  longest  matchable  dictionary  is  a.s.c.  but  may  be  non¬ 
proper.  A  simple  example  of  such  a  dictionary  is  {0,00,10, 
110, 1110,  •  ••  •},  which  is  obviously  non- proper. 

For  a  longest  matchable  dictionary,  we  will  try  to  obtain 
an  equivalent,  proper  and  a.s.c.  dictionary.  Let  (ipo,  >po)  be  a 
VV  length  encoder  with  a  longest  matchable  dictionary.  For 
n  —  1,2,---,  we  recursively  define  ipn  as  follows.  Let  Mn- i 
be  the  collection  of  entries  of  ipn- i  each  of  that  is  a  prefix  of 
another  entry  in  ipn- 1.  Define 

ipn  =  (ipn- 1  \  Mn- 1)  U  {y  €  Mn-\ipo  |  there  is 
no  prefix  of  y  in  ipn- 1  \  Mn-i  }, 

where  Mn  -  i  ipo  is  the  collection  of  concatenations  of  two 
strings  from  M„- 1  and  ipo,  respectively.  If  ipo  is  proper,  then 
tpn  =  ip0  for  all  n.  For  {ipn},  we  now  let 

ipoa  =  lim  inf  ipn . 

n — ►  oo 

It  is  shown  that  ipoo  is  proper.  But  unfortunately,  it  may  be 
empty. 

Theorem  2  Let  (ipo,  ipo)  be  a  VV  length  encoder  with  longest 
matchable  parsing.  If  ipoo  is  a.s.c.,  then  with  some  ip^,  the 
VV  length  encoder  (tl'oo ,  ipoo )  emits  the  same  codewords  as 
(ipo,  ipo)  with  probability  one. 

Corollary  1  If  .(ipo,  ’po)  is  a  VV  length  encoder  with  longest 
matchable  parsing  and  ipoo  is  a.s.c.,  then  (ipo,  ipo)  obeys  the 
VV  length  coding  theorem  with  proper  and  a.s.c.  parsing. 
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Abstract  —  We  investigate  the  Gaussian  side  information 
channel  in  the  Shannon  [1]  setup  and  propose  a  method  that 
achieves  correlation  between  the  signal  and  the  side  information 
noise.  The  capacity  still  remains  to  be  determined  however. 


I.  Introduction 

Shannon  [1]  investigated  the  communication  system  in  which  the 
state  Sfr  of  the  channel  Pch(y\x< s)  during  transmission  k  = 
1,  2,  •  •  •  ,  K  is  selected  according  to  the  distribution  PS{(s).  Both  the 
channel  and  the  state  selector  are  assumed  to  be  memoryless.  The 
encoder  sends  the  message  m  e  { 1 , 2,  •  •  •  ,  exp(A'/?)}  to  the  decoder. 
When  x/c  is  to  be  produced,  the  encoder  may  use  its  knowledge  of 
the  state  Sk,  which  is  made  available  to  him  just  before  transmission 
k  is  about  to  begin,  and  all  previous  states  ,  S2,  ■  ■  ■  ,  s^- 1  •  Hence 
Xk  =  F(M,  S\,  S2,  ■  ■  ■  ,  S£_i,  $k)  for  some  encoding  function  F(-). 
When  the  channel  input  alphabet  X  and  state  alphabet  S  are  dis¬ 
crete,  the  capacity  is  known  (see  [1]).  This  is  not  the  case  for  the 
Gaussian  side  information  channel  however.  Here  the  channel  output 
Y  =  X  +  S  +  Z,  where  S  and  Z  are  independent  Gaussian  random 
variables  with  mean  0  and  variances  Q  and  N  respectively.  The  code¬ 
words  X j*  =  (A"i ,  X2,  •  •  •  ,  X k)  must  satisfy  the  power  constraint 
^jt=1  K  Xj-  <  K  P.  The  objective  here  is  to  find  the  capacity  of  this 
Gaussian  channel. 

An  upper  and  lower  bound  for  the  capacity  C  in  nats  per  transmis¬ 
sion  are 


PP  +  Q  +  N\ 

V  Q  +  N  ) 


1 

<  C  <  -  In 
~  “2 


(1) 


In  [3]  noise  cancelation  and  noise  concentration  was  studied. 


II.  Correlating  signal  and  noise 

Here  we  transpose  the  Costa  method  [2]  by  trying  to  realize  correla¬ 
tion  on  the  symbol  level.  In  transmission  k  we  transmit  a  signal 
from  the  set 


B  = 


5 B_  _3B  _B  B  3B  5B 
2’  2’  2  ’  +  2  ’  +  2  ,+  2  ’ 


for  some  well  chosen  B.  Assume  that  during  transmission  k  the  mes¬ 
sage  j  G  {1, 2,  •  •  ■  ,  J]  must  be  transmitted.  This  message  corre¬ 
sponds  to  a  subset  Bj  of  B.  E.g.  for  J  —  2  we  could  get  the  subsets 


Bi  = 
B2  = 


B  5  B 

+  — 1  +~Z~,  ■ 
2  2 

B  3  B 

~2’+T 


and  for  j  =  1  we  choose  a  signal  Uk  e  B\  and  for  j  —  2  we  choose 
a  signal  u k  from  Bi.  What  we  mean  by  this  is  that  the  transmitter 
chooses  the  input  x <■  such  that  Uk  =  x^  +  e  Bj  and  xj-  is  minimal, 
when  message  j  is  to  be  transmitted.  The  channel  output  y*  =  x*  + 
sjc  +  z/c  =  ttk  +  Zk  is  the  signal  uk  to  which  Gaussian  noise  with 
variance  N  is  added.  It  will  be  clear  that  for  B~  N  we  obtain  a  rate 


R  close  to  ln(7)  bits  per  transmission.  On  the  other  hand  the  power  P 
that  is  needed  to  concentrate  the  signal  on  Bj  is  roughly  (7S)2/ 12. 
Hence  J  «  yJ\2P/B2.  If  we  take  5 2  =  12JV  then  J  «  JP/N  and 
we  achieve  a  rate  R  close  to  (1/2)  ln(P/AQ,  which  is  independent  of 
Q  and  more  or  less  what  we  want  for  P  ~  J2N  N  hence  for  J 
not  too  small. 

Note  that  this  method  realizes  that  the  transmitted  symbol  Uk  = 
Xk  +  Sk  is  correlated  with  the  state  Sk¬ 
ill.  Numerical  evaluation 

For  Q  =  100,  N  =  1  for  0  <  P  <  300  we  have  computed  the  lower 
and  upper  bound  according  to  (1)  and  the  rates  achieved  by  the  can¬ 
cellation  and  concentration  techniques  from  [3].  In  the  concentration 
case  A  =  20  was  chosen.  The  curves  are  shown  in  the  first  figure. 


Moreover  for  J  =  2,  3,  -  -  ,10  and  B  =  2,  4,  6,  ■  •  •  we  have 
determined  the  channel  with  input  U  taking  values  from  the  alphabet 
(1,2,--  ,  J }  and  output  Y.  The  mutual  information  I(U\  T)  of  this 
channel  is  computed  assuming  that  U  is  uniformly  distributed  over 
(1,2,---  ,  J).  Also  the  average  concentration  power  is  determined. 
This  leads  for  each  value  of  J  to  a  curve  representing  the  power-rate 
pairs  for  several  values  of  B.  These  curves  are  in  the  second  figure. 
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Abstract  — 

We  determine  the  capacity  regions  for  a  class 
of  time-varying  multiple- access  channels  (TVMACs), 
when  the  underlying  channel  state  evolves  in  time  ac¬ 
cording  to  a  probability  law  which  is  known  to  the 
transmitters  and  the  receiver.  Additionally,  the  trans¬ 
mitters  and  the  receiver  have  access  to  varying  de¬ 
grees  of  channel  state  information  (CSI)  concerning 
the  condition  of  the  channel.  Discrete-time  channels 
with  finite  input,  output  and  state  alphabets  are  con¬ 
sidered  first.  Next,  in  order  to  reduce  transmitter 
complexity,  we  restrict  the  encoders  at  each  time  in¬ 
stant  to  depend  only  on  a  limited  extent  of  CSI.  As 
a  special  case,  we  consider  a  memoryless  TVMAC, 
with  the  channel  state  process  being  a  time- invariant, 
indecomposable,  aperiodic  Markov  chain.  We  then 
study  a  time-varying  (multiple-access)  fading  channel 
(TVFC)  with  additive  Gaussian  noise,  when  various 
amounts  of  CSI  are  provided  to  the  transmitters  while 
perfect  CSI  is  available  to  the  receiver,  and  the  fades 
are  assumed  to  be  stationary  and  ergodic. 

I.  Preliminaries 

Consider  first  a  discrete-time  two-sender  TVMAC  with  (fi¬ 
nite)  input  alphabets  X\,Xi,  (finite)  output  alphabet  y  and 
(finite)  “state  space”  5.  The  probability  law  of  the  channel  is 
characterized  by  a  sequence  of  (known)  transition  pmf’s 

W  =  {W(yn\x’t,x?,sn)  : 

Xi  £  xr,  x%  £  X?,  sn  £Sn,yn  £  (T}^,  (LI) 

and  a  (known)  probability  law  Ps  governing  the  5-valued  state 
process  which  allows  the  state  at  any  time  to  depend 

on  the  previous  states  but  not  on  the  previous  channel  inputs 
or  outputs. 

Let  £i,  £2  and  T>  be  finite  sets  and  hi  :  S  — >•  £1,  /12  :  5  — >  £ 2 , 
and  g  :  S  -A  V  be  mappings  which  are  used  to  describe  the 
CSI  available  to  the  two  senders  and  the  decoder,  respectively. 
Thus,  at  each  time  instant  t,  the  encoder  for  sender-1  (resp. 
sender-2)  is  provided  with  the  instantaneous  CSI  ei,t  =  hi  (st) 
(resp.  62 ,t  =  /i2(st))  while  the  decoder  is  provided  with  CSI 
dt  —  g{st),  all  in  a  causal  manner.  The  capacity  region  of  the 
TVMAC  in  (1.1)  for  the  average  probability  of  error  criterion 
will  be  denoted  by  C. 

In  order  to  reduce  encoder  complexity,  we  shall  often 
consider  situations  in  which  we  restrict  the  encoder  for 
sender- j,  j  —  1,2,  to  depend  only  on  the  limited  CSI 
(ej,max{i,t— fc+i}j  -  •  •  >  ej,t)  at  time  t,  for  some  fixed  integer 
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k  >  1.  The  capacity  region  corresponding  to  this  “encoder 
restriction  of  window-fc”  is  denoted  by  C(fc). 

We  shall  also  consider  the  multiple- access  time-varying  fad¬ 
ing  channel  (TVFC)  model,  in  which  the  received  IR-valued 
signal  is  given  by 

2 

Yt  =  YJSj,txj,i+Nt,  t  >  1,  (1.2) 

1=1 

where  {rj,t}tii  and  {Sj,t}?L  1  are  the  IR-valued  transmitted 
signal  and  IR+-valued  fade  of  sender- j,  j  —  1,2,  respec¬ 
tively,  and  { 1  is  i.i.d.  Gaussian  noise  with  mean  zero 
and  variance  a2N.  The  fading  processes  {Sj,t}tLu  j  =  1,2, 
are  assumed  to  be  jointly  stationary  and  ergodic,  though  not 
necessarily  independent  of  each  other;  they  are  independent 
of  {Nt} jLi.  The  state  of  the  channel  at  time  t,  t  >  1,  is 
St  —  (5i,t,  52,t)  £  5  =  IR2.  The  CSI  available  to  sender- j  is 
given  by  a  mapping  hj  :  (IR+)2  — >  £j,  where  £j  can  be  an  arbi¬ 
trary  subset  of  1R  which  is  not  necessarily  finite.  The  decoder 
is  assumed  to  possess  perfect  CSI,  i.e.,  dt  =  st,t>  1.  Sender- 
j,  j  =  1,2,  is  assumed  to  be  subject  to  an  input  (average) 
power  constraint  Vj . 

II.  Results 

Our  results  include  a  determination  of  the  following  ca¬ 
pacity  regions  with  CSI  at  the  encoders  and  decoder.  Some 
of  them  constitute  generalizations  to  the  multiple- access  sit¬ 
uation  of  results  for  single-sender  time-varying  channels  in 
[Caire-Shamai,  IT  Sept. 1999]. 

•  The  capacity  region  C  of  the  TVMAC  in  (1.1)  (this  follows 
as  a  straightforward  consequence  of  the  approach  in  [Han, 
IT  Nov.  1998]),  including  in  the  special  case  when  the  CSI 
available  to  the  encoders  is  contained  in  that  available  to  the 
decoder. 

•  The  capacity  region  C  of  the  TVMAC  in  (1.1)  when  the 
channel  is  memoryless  and  when  the  state  process 

is  a  time  invariant,  indecomposable,  aperiodic  Markov  chain 
(TIAMC),  under  suitable  sufficient  conditions  for  the  invari¬ 
ance  of  C  with  respect  to  the  distribution  of  Si- 

•  The  capacity  region  C  (k)  under  an  encoder  restriction  of 
window-fc,  when  the  TVMAC  and  the  state  process  are  as  in 
the  previous  situation;  also,  the  capacity  region  C  when  the 
distribution  of  Si  is  the  stationary  distribution  of  the  TIAMC. 

•  The  capacity  regions  C  (k)  and  C,  when  the  TVMAC  in 

(1.1)  is  memoryless  and  the  state  process  is  stationary 

and  ergodic. 

•  The  capacity  regions  C(fc)  and  C  of  the  TVFC  in  (1.2), 
as  also  implications  when  the  fades  are  Raleigh  and  varying 
degrees  of  CSI  are  provided  to  the  transmitters. 
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Abstract  —  Feedback  strategies  are  presented  for 
the  K-user  white  Gaussian  multiple-access  channel. 
The  strategies  are  based  on  the  discrete  Fourier  trans¬ 
form  of  Tength  K  and  achieve  the  sum-rate  capacity 
for  equal  user  powers  if  the  signal-to-noise  ratio  is 
large  enough. 

I.  Introduction  and  Model 

The  capacity  region  of  the  2-user  white  Gaussian  multiple- 
access  channel  with  feedback  (MAC-FB)  was  determined 
in  [1],  However,  it  seemed  difficult  to  generalize  this  result 
to  more  than  2  users.  We  show  that  one  can  partially  circum¬ 
vent  the  difficulties  by  considering  a  complex  noise  model  and 
by  using  the  discrete  Fourier  transform. 

The  K  user  complex  white  Gaussian  MAC-FB  is  a  K  +  1 
terminal  channel  with  K  inputs  X\,  X2,  .  . .,  Xk  and  one  out¬ 
put  Y  =  (X^f=i  Xk)  +  Z ,  where  Z  is  a  circularly  symmet¬ 
ric  complex  Gaussian  random  variable  with  variance  a7  — 
E[|Z|2].  Terminal  fc  transmits  a  Bk  bit  message  to  the  receiv¬ 
ing  terminal  in  N  channel  uses,  so  that  its  rate  is  Rk  =  Bk/N 
bits  per  use,  fc  =  1, . . . ,  K.  At  time  n  the  transmitting  termi¬ 
nals  use  the  past  n  —  1  channel  outputs  Yn_1  to  encode  their 
messages.  The  power  constraints  on  the  transmissions  are 
E[l'5Gn|2]/Ar  <  Pk  for  some  constants  Pk,  fc  =  1, . . . ,  K . 

II.  Transmission  and  Reception 

Terminal  fc  transmits  by  mapping  its  message  onto  a  point  Ok 
in  the  complex  plane  and  by  correcting  the  receiver’s  estimate 
of  this  point.  More  precisely, 


where  crl^n_1-)  —  E[|ejt(T»— 1) |2]  is  the  variance  of  the  receiver’s 

error  £*,(„_  1)  =  9k(n-i)  —6k  after  n— 1  channel  uses  and  is 
a  modulation  coefficient.  We  call  this  transmission  technique 
modulated  estimate  correction  (MEC)  [2].  We  will  choose  the 
rrikn  to  be  entries  from  the  length  K  discrete  Fourier  transform 
matrix,  i.e.,  mkn  =  eP^k-i)(n-i)/K 

We  let  the  receiving  terminal  estimate  the  9k  by  using  a 
linear  minimum-mean  square  error  (LMMSE)  estimator.  The 
result  is  the  K  recursions 

°fcn  =  fffc(n-i)  •  Ty„|xtn/  Ty„>  (2) 


where  Vyn  is  the  variance  of  Yn  and  VYn\xkn  is  the  variance 
of  Yn  given  Xkn  ■  Consider  also  the  correlation  coefficients 


'This  work  was  performed  while  the  author  was  with  Endora 
Tech  AG,  Hirschgasslein  40,  4051  Basel,  Switzerland. 


Number  of  Users  =  K 


Fig.  1:  Sum-rate  achievable  with  P  =  10  and  a2  —  1.  The  rate 
units  are  nats/use/dimension. 

and  let  Qlr>  be  the  K  x  K  matrix  with  pktn  in  the  (fc,6)th 
position.  One  finds  that 


where  cn  is  the  column  vector  of  the 

Ckn  —  E[ek(n_i)y,  ]  j  yj^k^n-l)  <  (5) 

Y-yn\xK  is  the  column  vector  of  the  VYn\xkn  ,  the  “©”  in  front 
of  the  fraction  denotes  term-by-term  division  (a  “Hadamard 
quotient”),  and  the  square-root  in  the  denominator  denotes 
taking  term-by-term  square  roots.  The  recursion  (4)  yields 
good  rates  for  a  wide  variety  of  power  constraints  and  rates. 
Moreover,  if  Pk  =  P  for  all  fc  the  analysis  can  be  simplified 
to  an  eigenvalue  recursion.  Numerical  results  are  depicted  in 
Fig.  1  for  P  =  10  and  o2  =  1,  where  the  sum-rate  capacity 
is  achieved  for  K  <  8.  It  can  be  shown  that,  in  general, 
MEC  combined  with  LMMSE  estimation  achieves  the  sum- 
rate  capacity  for  equal  user  powers  if  the  signal-to-noise  ratio 
is  beyond  some  finite  number  that  depends  on  K  only. 
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Abstract  —  The  achievability  of  the  Wyner-Ziv  the¬ 
orem  [1]  for  the  Gaussian  case  is  shown  using  only 
geometric  arguments.  The  motivation  for  this  is  to 
inspire  the  construction  of  practical  codes  based  on 
this  [2]. 

I.  Introduction 

Distributed  source  coding  deals  with  the  encoding  of  cor¬ 
related  sources  that  do  not  communicate  with  one  another. 
The  concept  of  distributed  source  coding  has  been  considered 
for  continuous  sources  with  a  fidelity  criterion  in  [1]  where 
X  and  Y  are  sources  such  that  the  decoder  has  (lossless)  ac¬ 
cess  to  Y,  and  the  encoder  compresses  X  using  the  fact  that 
the  decoder  knows  Y.  It  was  shown  that  when  X  and  Y 
are  jointly  Gaussian  with  mean  squared  error  as  the  fidelity 
criterion,  it  is  possible  to  compress  X  as  efficiently  as  the 
case  when  both  the  encoder  and  the  decoder  have  access  to 
Y  i.e.  Rx\y{D)  =  R*(D)  [1],  In  [2],  a  construction  of  gen¬ 
eralized  coset  codes  has  been  proposed  based  on  the  channel 
coding  principles.  Using  geometrical  arguments  we  establish 
the  achievability  of  the  following  rate  distortion  bound  with 
side  information  for  X: 

2l0g  ((1  +  k)d*)’  (1) 

where  X  is  i.i.d.  Gaussian  with  zero  mean  and  unit  variance, 
d*  is  the  reconstruction  fidelity,  and  the  side  information  is 
given  by  Y  —  X  +  N,  where  N  is  i.i.d.  Gaussian  with  zero 
mean  and  with  variance  cr„  and  independent  of  X 


II.  Geometric  Derivation  of  Wyner-Ziv  bound 

We  encode  the  source  X  in  blocks  of  length  L.  Randomly 
distribute  2LRl  codewords  independently  and  uniformly  on 
the  surface  of  an  i-dimensional  sphere  S l  (\/l  —  d2  ')  of  radius 
y'l  —  02  where  R\  is  a  real  number  which  depends  on  9  (0  = 
■J )■  Let  C  denotes  this  set  of  codewords.  Randomly 

choose  2lr 2  codewords  independently  and  uniformly  from  this 
set  C,  and  give  an  index  to  this  subset  called  a  coset.  Keep 
repeating  this  until  all  the  codewords  are  exhausted.  Thus, 
there  are  2L('Rl~R’7'>  =  2LR  indices. 

Encoding  involves  finding  a  codeword  from  the  space  of 
codewords  satisfying  a  distance  criterion  (within  distance  6) 
and  sending  the  index  of  the  coset  containing  the  encoded 
codeword  to  the  decoder.  If  none  of  the  codewords  satisfies 
the  criterion,  then  an  error  is  declared.  We  derive  a  lower 
bound  on  R\ .  The  basic  idea  behind  the  proof  is  the  follow¬ 
ing:  encoder  chooses  a  codeword  to  represent  X,  and  nature 
chooses  the  side  information  Y  independent  of  the  encoder 

'This  work  was  supported  by  in  part  by  DARPA  Grant  F29601- 
99-1-0169  and  NSF  (CAREER)  Grant  MIP  97-03181. 


using  a  “channel”  p(y\x)  on  X.  Yet  there  exists  a  strong 
correlation  between  the  quantized  codeword  and  the  side  in¬ 
formation,  which  needs  to  be  exploited.  This  key  result  (also 
known  as  the  Markov  Lemma  [1])  used  in  the  geometric  argu¬ 
ments  is  given  by  the  following  theorem. 

Theorem:  For  any  e  >  0  (sufficiently  close  to  0),  3  Li(e) 
such  that  for  any  L  >  Li(e),  P  [|||Y||2  —  62  —  <r2  |  >  e]  <  c, 
where  Y  =  Y  -  W,  Y,  W  are  side  information  and  quantized 
codeword  vectors  respectively. 

The  first  part  of  the  decoder  involves  finding  a  “suitable” 
codeword  in  the  given  coset.  If  either  more  than  one  or  none 
of  the  codewords  satisfies  the  distance  criterion,  then  an  error 
is  declared.  We  derive  an  upper  bound  on  #2-  The  second 
part  of  the  decoder  then  gets  the  reconstruction  vector  as  a 
function  of  the  encoded  codeword  and  the  side  information 
vector.  It  is  shown  geometrically  that  the  probabilities  of  the 
error  events  can  be  made  arbitrarily  small  for  large  L.  The 
geometry  for  the  ideal  case  of  very  large  L  is  shown  in  Fig.  1. 


Figure  1:  OA,OC  and  OH  represent  the  codeword  and  ob¬ 
served  side  information  and  the  reconstruction  vectors  respec¬ 
tively.  A,  H  and  C  are  collinear.  OH  gives  the  minimum 
distortion  given  that  OA  and  OC  are  codeword  and  the  side 
information  respectively.  FG  and  DE  represent  two  spheres. 
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Abstract  —  We  will  discuss  the  generation  of  dc- 
free  runlength-limited  (DCRLL)  sequences.  We  pro¬ 
pose  to  employ  standard  RLL  codes,  where  dc-control 
is  effectuated  by  multiplexing  the  source  data  or  the 
encoded  data  with  dc-control  bits.  The  dc-control  bits 
offer  the  degree  of  freedom  required  to  shape  to  spec¬ 
trum.  It  will  be  shown  that  a  new  technique,  called 
parity  preserving  assignment,  will  offer  great  benefits  over 
other  constructions. 

I.  Introduction 

The  design  of  dc-free  runlength-limited  (DCRLL)  codes 
can,  at  least  in  principle,  be  systematically  accomplished 
by  the  many  design  techniques  published  [1].  Unfortu¬ 
nately,  the  design  of  a  DCRLL  code  with  a  rate  close 
to  the  Shannon  capacity  of  the  constrained  channel,  is 
severely  hampered  by  the  large  number  of  states  of  the 
finite-state  machine  which  models  the  channel  constraints 
at  hand.  The  large  number  of  states  of  the  underlying 
FSM,  can,  at  least  in  principle,  be  handled  by  buying  a 
larger  computer,  but  the  insight  required  is  too  easily  lost. 
Essentially,  there  are  two  systematic  design  approaches 
that  emerged  in  the  literature. 

The  first  method  uses  a  standard  method,  such  as  the 
ACH  algorithm  to  design  an  RLL  code.  In  the  final  stage 
of  the  ACH  algorithm  we  end  up  with  a  graph  with  the 
property  that  from  any  state  of  the  graph  there  are  at 
least  2m  (m  is  assumed  to  be  the  source  word  length) 
outgoing  edges.  These  surplus  edges  are  used  as  alterna¬ 
tive  codewords  that  can  be  used  for  dc-control.  The  rate 
8/16,  (2,10)  EFMPlus  code  is  an  example  of  a  DCRLL 
code  used  in  practice  (DVD)  that  was  designed  according 
to  these  guidelines  [1]. 

In  the  second  method,  a  given,  state-of-the-art,  RLL 
code,  is  used  to  generate  RLL  sequences.  The  sequences 
generated  under  the  coding  rules  of  said  code  are  mul¬ 
tiplexed  with  dc-control  bits  for  minimizing  the  low- 
frequency  components.  The  user  data  or  alternatively 
the  encoded  data  are  partitioned  into  segments  of  v  bits. 
Between  two  consecutive  u-bit  segments  /3  dc-control  bits 
are  inserted,  and  the  &  dc-control  bits,  in  turn,  are  chosen 
to  minimize  the  low-frequency  components. 

II.  Codes  with  parity  preserving  word 

ASSIGNMENT 

In  order  to  make  it  possible  to  efficiently  control  the  dc- 
content  in  the  source  date  level  mode,  we  have  made  the 
assignment  between  source  words  and  codewords  in  such 


Table  1:  Variable-length  synchronous  rate  2/3,  (l,oo)  code 
with  parity  preserving  assignment. 


Data 

Code 

00 

< — > 

000 

01 

< — > 

010 

10 

t — > 

100 

1100 

i - > 

001010 

1101 

< — ► 

001000 

1110 

< — > 

101010 

mi 

< - > 

101000 

a  way  that  the  parity  of  both  source  word  and  its  assigned 
codeword  are  the  same.  The  parity,  P,  of  an  n-bit  word 
(xi,.  ..,x„),  X,-  G  {0,1},  (either  source  or  codewords)  is 
defined  by 

n 

P  =  ^  Xi  mod  2. 

«=1 

In  other  words,  if  the  source  word  has  an  even  (or  odd) 
number  of  ’one’s  then  its  channel  representation  also  has 
an  even  (or  odd)  number  of  ’one’s.  A  code  with  a  par¬ 
ity  preserving  assignment  has  the  virtue  that  when  it  is 
used  in  conjunction  with  dc-control  bits  at  data  level  that 
setting  an  even  (or  odd)  number  of  ’one’s  at  data  level 
will  result  in  an  even  (or  odd)  number  of  ’one’s  at  code 
level.  This  leads,  as  we  will  demonstrate,  to  an  efficient 
dc-control. 

The  variable  length  rate  2/3,  (1,  oo)  code  shown  in  Ta¬ 
ble  1,  is  an  example  of  a  code  with  the  parity  preserving 
property.  It  can  easily  be  verified  that  indeed  the  as¬ 
signment  is  parity  preserving.  In  the  presentation,  we 
will  show  the  difference  in  performance  between  various 
DCRLL  codes. 
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Abstract:  Based  on  the  ideal  autocorrelation  property 
of  sequences  with  odd  length  n,  a  general  method  to 
generate  binary  sequences  of  length  N=2n  with  optimal 
frame  synchronization  property  is  studied.  The  sequence 
generation  method  may  be  useful  in  the  field  of 
communication  systems. 

I.  Introduction 

The  importance  of  sequences  with  optimal  frame 
synchronization  property  is  increasing  since  by  employing 
two  thresholds  at  the  output  of  correlator  of  such  sequences, 
we  can  double-check  frame  synchronization  and  thus 
improve  the  frame  synchronization  performance  [1].  If  the 
autocorrelation  function  of  a  sequence  has  double  maximum 
values  equal  in  magnitude  and  opposite  polarity  at  zero  and 
middle  shifts,  and  further  if  the  function  has  the  lowest  out- 
of-phase  values  except  for  at  middle  shift,  then  such  a 
sequence  is  said  to  have  “optimal  frame  synchronization 
property”.  A  sequence  is  said  to  have  “ideal  autocorrelation 
property”  if  the  out-of-phase  autocorrelation  value  of  the 
sequence  is  all  “-1  ”  or  “1  ”.  Based  on  sequences  with  ideal 
autocorrelation  function,  a  general  method  to  generate  the 
optimal  frame  synchronization  sequences  of  length  N  =  2 n 
is  proposed.  Since  all  the  binary  maximal  length  sequences 
of  period  n  =  2r  - 1  have  the  out-of-phase  autocorrelation 
value  “-1”,  we  can  easily  construct  the  desired  optimal 
frame  synchronization  sequences  by  using  the  proposed 
method. 


III.  Design  of  optimal  frame  synchronization 
sequences 

Using  the  following  important  results,  we  can  construct 
the  optimal  frame  synchronization  sequences. 


Theorem  1:  Let  S  =  (s;)  be  an  n -tuple  binary  sequence 
over  GF(2)  with  ideal  autocorrelation  function,  where 
n  =  21  + 1  ,  /  =  1,2,3,  •  ■  ■  .  If  a  sequence  A  -  (a, )  is 
constructed  by  the  following  relationships, 

^2z(modV)  —  ^z(modn)  (2) 

Z(a(2z+l)(mod N)  )  =  PfG(z+(rt  +l)/2)(modn)  )  (3) 


then  the  sequence  A  has  the  following  optimal  frame 
synchronization  property. 


*.(*)  = 


N,  r  s  0(mod  N ) 

-N,  T  =  n(mod  N) 

2,  T  =  odd,  r  *  n(mod  N) 

-  2,  T  =  even,  r  *  0(mod  N) 


for  positive  odd  l 


(4) 


N,  X  &  0(mod  N) 

-N,T  =  n(mod  N) 

tor  positive  even  /  (5) 

-  2,  r  =  odd,  r  n(mod  N ) 

2,  T  =  even,  T  *  0(mod  N) 


II.  Definitions  and  basic  properties 

Let  S  =  (S' )  =  (s0 ,  s,  ,■■*’,  s„_i )  be  an  n-tuple  binary 
sequence  over  GF(2)={0,1 }  and  T  be  a  cyclic  shift  left 
operator  such  that  TS  =  (•s1,s2,---,s0)  .  The  periodic 
autocorrelation  function  is  defined  by  [2] 

n-1 

^(T)  =  X^)-Z(Vo(  cud.))  0) 

i= 0 


From  theorem  1  we  observe  that  the  sequences  with  optimal 
autocorrelation  property  of  even  period  can  be  generated  by 
(2)  and  (3)  based  on  binary  sequences  with  ideal 
autocorrelation  property. 

Corollary  1:  Let  S  be  a  maximal  length  sequence  over 
GF(2)  of  period  n  =  2r  - 1  ,  r  >  2  .  Then  the 
autocorrelation  function  of  A  with  period  N  =  2n  , 
which  is  generated  by  (2)  and  (3)  becomes  (4). 


where  x(')  is  the  unique  isomorphism  of  the  addition 
group  {0,1}  onto  the  multiplication  group  {-1,1}  ■  We  call 
R(0)  the  “in-phase  autocorrelation  value”  and  R( z)  ’s 
(r  *  0)  the  “out-of-phase  autocorrelation  values”.  The 
sequence  S  is  said  to  have  the  “ideal  autocorrelation 
property”  if  its  periodic  autocorrelation  function  has  the 
lowest  out-of-phase  autocorrelation  value  of  “  1  ”  or  “  -1  ”. 
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Abstract  —  We  consider  the  problem  of  generating 
random  numbers  with  a  specified  distribution  by  pars¬ 
ing  a  sequence  of  general  coin  tosses.  We  suggest  a 
method  based  on  homophonic  coding  that  allows  the 
number  of  tosses  to  approach  arbitrarily  close  to  the 
lower  bound  with  exponentially  less  computational  ef¬ 
fort  than  other  known  methods. 

I.  Introduction 

The  statement  of  the  problem  of  random  number  gener¬ 
ation  is  to  produce  a  random  number  that  obeys  some  pre¬ 
scribed  target  distribution  when  having  as  an  input  some 
known  random  process,  e.g.,  repeated  tosses  of  a  coin.  A 
general  coin  is  assumed  to  take  on  values  from  {1,2,...,  M} 
with  probabilities  p(l),  p{ 2),  . . . ,  p{M).  The  additional  natu¬ 
ral  setting  to  the  problem  is  to  minimize  the  (average)  number 
of  tosses  required  for  generation  of  one  random  number. 

Knuth  and  Yao  [1]  suggested  a  method  for  generating  i.i.d. 
random  numbers  with  specified  probability  distribution  by 
making  use  of  fair  coin  tosses.  They  constructed  a  parse  tree 
which  minimizes  the  average  number  of  coin  tosses  and  proved 
that  H+ 2  is  the  lower  bound  for  the  average  number  of  tosses, 
where  H  is  the  Shannon  entropy  of  the  target  distribution. 
However,  when  a  sequence  of  random  numbers  is  to  be  gener¬ 
ated,  the  size  of  the  tree  grows  exponentially  and  the  method  is 
intractable.  Han  and  Hoshi  [2]  suggested  an  interval  algorithm 
which  requires  only  linear  growth  of  the  memory  size  as  the 
length  of  the  sequence  increases,  and  involves  multiplication- 
type  operations  with  linearly  growing  precision  which  results 
in  roughly  quadratic  growth  of  computation  time.  They  also 
used  general  coin  tosses  as  an  input. 

To  construct  a  class  of  less  complex  methods  of  random 
number  generation  we  suggest  an  approach  based  on  homo¬ 
phonic  coding.  We  show  that  using  the  ACIS-method  [3] 
requires  only  logarithmic  growth  of  the  memory  size  and 
roughly  square-logarithmic  growth  of  computation  time  when 
approaching  the  lower  bound  for  the  number  of  coin  tosses. 

II.  The  Essence  of  the  Approach 

In  homophonic  coding,  a  message  x  1X2X3  . . .  with  known 
probabilities  of  symbols  is  converted  into  a  code  sequence 
C1C2C3...,  c  £  {0,1},  whose  symbols  are  equiprobable  and 
independent,  i.e.,  indistinguishable  from  random  bits.  The 
decoder  receives  the  sequence  C1C2C3  . . .  and  also  knows  the 
source  probability  distribution.  Having  these  data,  it  recon¬ 
structs  the  initial  message  X1X2X3  ....  Obviously,  if  instead 
of  C1C2C3  . . .  the  decoder  is  given  a  sequence  of  fair  coin  tosses 
rir2T3  . . .  and  a  target  distribution,  it  will  produce  a  sequence 
of  random  numbers  yij/21/3  . . .  that  obey  this  distribution. 

Define  the  loss  of  generator  to  be  the  mean  per  random 
variable  excess  of  the  number  of  coin  tosses  over  the  random 

^his  work  was  supported  by  RFBR  Grant  99-01-00586. 


variable  entropy.  This  loss  is  equal  to  the  redundancy  of  a 
correspondent  homophonic  encoding  that  would  produce  such 
tosses.  The  ACIS-encoding  [3]  is  to  the  date  the  fastest  ho¬ 
mophonic  coding  method  that  allows  for  arbitrarily  small  re¬ 
dundancy.  Hence,  the  ACIS-decoding  can  provide  arbitrarily 
small  loss  when  used  as  a  random  number  generator. 

Let  now  a  random  variable  X  takes  on  values  from 
{1,2,...,  A/ }  with  arbitrary  (but  known)  probabilities.  Let 
we  are  given  a  sequence  of  the  random  variable  values  Xm  = 
X\X2  ■  . .  x-m-  It  is  required  to  generate  a  sequence  of  ran¬ 
dom  numbers  Yn  —  J/1J/2  •  •  •  J/n  that  represent  values  of  a 
random  variable  Y  that  obeys  some  target  distribution  over 
{1,2, . . .  N}.  Denote  by  H(X)  and  H(Y)  the  entropies  of  X 
and  Y. 

To  solve  this  general  problem  we  apply  ACIS-encoding  to 
Xm  and  transfer  the  code  bits  produced  to  ACIS-decoder  that 
operates  under  the  distribution  for  Y .  But  the  encoder  needs 
an  extra  source  of  random  bits  in  order  to  make  homophone 
selection.  To  obtain  a  solution  in  a  standard  framework,  i.e., 
without  any  extra  source  of  randomness,  we  replace  the  source 
of  random  bits  with  an  auxiliary  random  bit  generator  and  let 
a  small  fraction  of  the  source  numbers  fork  to  the  generator. 
Of  course,  this  generator  cannot  be  based  on  homophonic  cod¬ 
ing.  But  we  could  expect  that  even  a  relatively  more  complex 
generator  would  not  increase  the  overall  complexity  since  the 
number  of  random  bits  it  must  generate  can  be  made  arbi¬ 
trarily  small. 

Theorem:  Let  the  scheme  based  on  ACIS  encoding  and 
decoding  be  used  to  generate  a  sequence  of  random  numbers 
Yn ,  n  — >  00,  by  parsing  a  sequence  of  general  coin  tosses  Xm . 
Then  the  expected  number  of  values  of  X  required  to  generate 
one  value  of  Y 


lim 

n— >oo 


E(m) 

n 


H{Y) 

H(X) 


and  the  loss  e  can  be  made  arbitrarily  small  at  the  expense  of 
the  memory  size  S  and  computation  time  T  growing  as 

S  =  0  (log  ,  T  =  0  (log  ^  log  log  *  log  log  log  . 
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Abstract  —  In  this  work  we  deal  with  the  prob¬ 
lem  of  linear  multiuser  detection  for  asynchronous 
DS/CDMA  systems.  We  introduce  a  new  Mean- 
Output-Energy  cost  function,  whose  constrained  min¬ 
imization  leads  to  two  new  linear  multiuser  detectors, 
exploiting  the  information  contained  in  the  pseudoau¬ 
tocorrelation  of  the  observables,  and  which  generalize 
and  outperform  the  classical  decorrelating  and  mini¬ 
mum  mean  square  error  receivers. 

I.  Introduction 

The  decorrelating  detector  [1]  and  the  minimum  mean 
square  error  (MMSE)  receiver  [2]  are  the  two  most  popular 
linear  multiuser  detectors;  indeed,  both  receivers  have  a  com¬ 
plexity  which  is  linear  in  the  users  number  and  achieve  opti¬ 
mum  performance  in  terms  of  near-far  resistance.  Both  detec¬ 
tors,  however,  exploit  only  the  information  contained  in  the 
autocorrelation  function  of  the  observables.  While  this  is  the 
optimum  strategy  when  dealing  with  proper  complex  random 
processes,  it  turns  out  to  be  suboptimal  in  situations  where  the 
disturbance  is  an  improper  complex  random  process1.  Since, 
as  shown  in  [3],  the  multiaccess  interference  (MAI)  can  be 
modeled  as  an  improper  complex  noise,  it  is  expected  that 
designing  receiving  structures  capable  of  exploiting  the  infor¬ 
mation  contained  in  the  pseudoautocorrelation  function  of  the 
observables  would  permit  achieving  better  performance.  In 
this  work,  we  deal  with  the  problem  of  linear  multiuser  de¬ 
tection:  a  new  cost  function  is  introduced,  whose  constrained 
minimization  leads  to  new  versions  of  the  decorrelating  and 
the  MMSE  receivers  exploiting  the  information  contained  in 
the  observables  pseudoautocorrelation  function. 

II.  System  Model  and  Detectors  Design 

We  consider  an  asynchronous  DS/CDMA  network  with  K 
active  users.  Stacking  in  an  N M-dimensional  vector2,  the 
discrete-time  samples  of  the  received  waveform  in  the  p-th 
signaling  interval  [pTi,,(p+  1)2),],  and  assuming  that  we  are 
interested  in  decoding  the  bits  from  the  user  “0”  and  that 
to  =  0,  we  obtain  the  vector: 

r(p)  =  A0b0(p)s  o  +  z(p)  +  w(p)  (1) 

In  the  above  equation,  z(p)  and  w(p)  contain  the  contribution 
from  the  MAI  and  from  the  additive  thermal  noise,  respec¬ 
tively.  Given  the  N M-dimensional  observable  vector  r(p),  any 

According  to  [3],  a  complex  random  process  n(t)  is  said  to  be 
proper  if  its  pseudoautocorrelation  function  Rn(t,  u)  =  E  [n(t)n(u)] 
is  zero  Vt,u,  and  it  is  said  to  be  improper  in  the  opposite  case  that 
Rn(t,u)  is  non-zero. 

2N  is  the  processing  gain,  M  is  the  number  of  samples  per  chip. 


linear  receiver  takes  a  decision  as  to  the  bit  bn  (p)  according  to 
the  rule: 

bo(p)  =  sgn  (&{dj?r(p)})  (2) 

where  sgn(-)  denotes  the  signum  function,  Jf  {■}  denotes  real 
part,  while  the  vector  do  €  CNM  is  to  be  designed  according 
to  some  optimization  criterion.  Since,  inspecting  the  decision 
rule  (2),  it  is  seen  that  the  receiver  performance  is  impaired 
by  the  disturbance  term  &  {do  (z(p)  +  w(p)) },  we  propose 
here  to  choose  the  vector  do  as  the  solution  to  the  following 
constrained  minimization  problem: 

E[(&{d^/i(p)})2]  =  min  s.t.  =  1  (3) 

The  solution  to  the  above  problem  can  be  shown  to  be  written 
as: 

do  —  HwSo  —  MvvMvvHvvso  (4) 

with  Hvv  —  2  ^ Mvv  —  M'VvM^vMvv'j  .  In  the  above 

equations,  (-)+  denotes  the  Moore-Penrose  generalized  inverse, 
Mvv  =  E  \y(p)vH  {p) J  and  M'vv  =  E  [u(p)w7’(p)J  repre¬ 
sent  the  covariance  and  pseudocovaxiance  matrix  of  the  vector 
v(p),  respectively3,  while,  finally,  v(p)  =  A0b0(p)so  +  h(p).  It 
can  be  shown  that,  if  we  let  h(p)  =  z(p),  solution  (4)  repre¬ 
sents  a  generalization  of  the  decorrelating  detector,  while,  if 
we  let  h(p)  =  z(p)  +  w(p),  solution  (4)  is  a  generalization  of 
the  MMSE  linear  multiuser  detector.  Only  in  the  special  case 
that  the  matrix  M'vv  is  zero,  does  solution  (4)  reduce  to  the 
classical  decorrelating  and  MMSE  linear  multiuser  detectors. 

As  to  the  performance  assessment,  it  can  be  shown  that  the 
newly  proposed  receivers  outperform  the  classical  linear  mul¬ 
tiuser  receivers.  Indeed,  an  analytical  proof  showing  that  the 
new  receivers  achieve  a  near-far  resistance  not  smaller  than 
that  of  the  classical  decorrelating  receiver  can  be  given,  while 
computer  simulations  confirm  the  superiority  of  the  new  re¬ 
ceivers  in  terms  of  error  probability  too. 
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Abstract  —  In  this  summary,  we  present  the  minimum  mean 
squared  error  multiuser  detector  for  noncoherent  detection  of 
non-orthogonal  multipulse  modulation.  The  detector  is  analyzed 
in  the  large  signal-to-noise  ratio  regime  and  it  is  shown  that  the 
MMSE  detector  approaches  a  new  multiple  access  interference 
suppressing  detector,  termed  the  multipulse  decorrelating  (MD) 
detector.  The  asymptotic  performance  of  the  detectors  is  pre¬ 
sented  for  the  additive  white  Gaussian  noise  noncoherent  chan¬ 
nel. 


I.  Introduction 


We  introduce  the  minimum  mean  squared  error  (MMSE)  detector 
for  non-orthogonal  multipulse  modulation  (NMM)  over  the  nonco¬ 
herent  multiple  access  channel.  NMM  is  a  generalization  of  orthog¬ 
onal  modulation  in  which  the  users  may  transmit  correlated  wave¬ 
forms,  allowing  for  bandwidth  efficiency.  The  generalized  maximum 
likelihood  (GML)  detector  has  been  studied  in  [l]-[6]  for  detection 
on  such  channels  and  we  compare  the  GML  with  the  MMSE  detector 
for  large  values  of  the  signal-to-noise  ratio  (SNR).  The  MMSE  will  be 
seen  to  approach  a  new  detector,  termed  the  multipulse-decorrelating 
(MD)  detector,  at  large  SNRs.  We  show  that  the  the  MD  (and  hence 
the  MMSE)  detector  is  asymptotically  superior  to  the  GML  detector 
for  binary  (M=2)  signalling  but  that  this  performance  advantage  does 
not  generalize  to  larger  cardinality  signal  sets. 


II.  Discrete  Time  Model 


We  adopt  the  following  discrete  model  for  the  output  of  the  nonco¬ 
herent  multiple  access  channel  after  basis  function  matched  filtering: 

y='HVb  +  n.  (1) 

The  matrix  K  —  [H(l),  H(2),  •  •  •  ,  H(R')]  contains  the  signal  vec¬ 
tors  for  each  user  with  H(fc)  =  [hi(fc),  h2(fc),  •  •  •  ,  hM(fc)]  and 
hm(A;)  is  the  mth  signal  corresponding  to  user  k.  The  vector 
6  =  [bT(l),  bT(2),  •  •  •  ,  bT(/f)]T  is  an  MK  x  1  vector  with  each 
b (k)  a  column  of  the  M  x  M  identity  matrix  which  selects  the  signal 
transmitted  by  user  k.  The  MK  x  MK  matrix  V  contains  the  user 
energy  and  phase  terms.  The  additive  noise,  n,  is  modeled  as  zero- 
mean  complex  Gaussian  with  correlation  matrix  .E[nn*]  =  cr2I. 

Assuming  that  the  phase  terms  are  independent  zero  mean  random 
variables,  the  measurement  y  has  first  and  second  order  statistics: 
m  =  ,E[y]  =  0  and  R  =  £7[yy*]  —  'HFH *  +  cr2I,  where  F  = 
diag{£7i  I,  •  ■  ■  ,Ek  1}  and  Ek  is  the  energy  associated  with  the  kth 
user. 


This  work  was  supported  by  the  National  Science  Foundation  under  Contract  No.  ECS  9979400  and 
by  the  Office  of  Naval  Research  under  Contract  No.  N00014-00- 1-0033.  The  results  contained  in  this 
paper  have  been  submitted  to  the  IEEE  Transactions  on  Communications. 


III.  The  MMSE  Detector 


The  (linear)  minimum  mean  squared  error  estimate  of  the  vector 
Vb  is  given  by  Vb  =  F?f  *R_1y-  We  make  decisions  on  user  k  by 
examining  the  kth  block,  D(k)b(k),  of  the  estimate  Vb.  This  leads 
to  the  simple  decision  rule: 


rriMMSE(h)  =  argmax  ||D(/j)b(A:)|  |  (2) 


=  argmax 


h*  (A;)R“1y 


where  R  =  i?[yy*]. 

At  high  values  of  the  signal  to  noise  ratio  (SNR)  we  find  that  the 
MMSE  detector  approaches  the  asymptotic  form: 


rh.MD{k)  =  argmax 


|(H(fc)*P^)H(fc))+H(fc)*P^(fc)y| 


where  A"^"  is  the  pseudo-inverse  of  the  matrix  A  and  the  interference 
matrix  S (k)  has  the  signals  H(()  for  l  f  k  as  its  columns.  We  call 
this  detector  the  multipulse  decorrelating  (MD)  detector  in  analogy 
to  the  linear  decorrelating  detector  of  [7]. 

Using  standard  union  bounds  and  two  signal  lower  bounds  we  find 
that  as  the  SNR  grows  large,  the  error  probability  for  the  MD  (and 
hence  the  MMSE)  detector  has  the  asymptotic  form 


■  exp 


min  ■ 

m^l 


-Ek 


((G*G)m,m  +  (G*G)("^  +2  (G*G)^,P 


•1)1 


(3) 


where  we  have  defined  G  =  P£(fc)H  (*). 

By  comparing  this  expression  with  the  probability  of  error  for  the 
generalized  maximum  likelihood  (GML)  detector  of  [l]-[5]  we  find 
that  for  M  =  2  (binary  signaling)  the  MD  (MMSE)  detector  is  supe¬ 
rior  to  the  GML  rule  asymptotically.  This  result  does  not  generalize 
to  larger  cardinality  signal  sets.  Neither  detector  is  uniformly  supe¬ 
rior  over  constellation  sizes  M  >  2. 
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Abstract  —  In  this  paper  we  present  a  new  group 
detection  strategy  for  asynchronous  DS/CDMA  Sys¬ 
tems.  The  new  detector  is  a  two-stage  one:  the  first 
stage  is  a  linear  filter,  aimed  at  suppressing  the  ef¬ 
fect  of  the  unwanted  users  signals,  while  the  second 
stage  is  a  non-linear  block,  implementing  a  maximum 
likelihood  (ML)  detection  rule  on  the  set  of  desired 
users  signals.  Simulation  results  confirm  that  the  new 
structure,  which  encompasses  Varanasi’s  group  detec¬ 
tor  as  a  special  case,  achieves  very  satisfactory  per¬ 
formance. 

I.  Introduction 

Given  a  Direct-Sequence  Code-Division  Multiple-Access 
(DS/CDMA)  communication  system  with  K  active  users, 
a  group  detector  jointly  demodulates  the  information  bits 
stream  from  a  certain  subset,  Q  say,  of  the  K  transmitting 
users.  The  concept  of  group  detection  was  first  introduced 
by  Varanasi  in  [1],  wherein,  with  reference  to  a  synchronous 
CDMA  system,  new  receiving  structures  were  derived,  based 
on  the  application  of  the  generalized  likelihood  rule. 

In  the  very  recent  past,  group  detection  has  become  an 
attracting  and  intriguing  research  topic,  in  that  it  has  been 
recognized  that  it  can  be  successfully  applied  to  wireless  cellu¬ 
lar  communications  so  as  to  come  up  with  multiuser  receivers 
able  to  suppress  both  the  intra-cell  and  the  inter-cell  interfer¬ 
ence  [2]  and  with  single-user  receivers  for  multirate/multicode 
CDMA  systems  [3]. 

In  this  work  we  consider  an  asynchronous  DS/CDMA  sys¬ 
tem  and  present  a  new  group  detection  structure;  it  is  a  two- 
stage  receiver:  the  first  stage  is  a  linear  filter,  aimed  at  sup¬ 
pressing  the  multiaccess  interference  (MAI),  and  whose  G- 
dimensional  output  (with  G  the  cardinality  of  the  set  Q  of 
the  desired  users)  is  forwarded  to  the  second  stage,  a  non¬ 
linear  device,  which  implements  a  maximum-likelihood  detec¬ 
tion  strategy  and  takes  the  final  decision  on  the  G  bits  of 
interest. 

II.  System  Model  and  Receiver  Synthesis 

We  consider  an  asynchronous  DS/CDMA  System  with  K 
active  users.  Assume,  without  loss  of  generality,  that  the 
users  to  be  decoded  are  indexed  by  0, . . . ,  G  —  1,  namely  that 
G  —  {0, . . . ,  G  —  1}.  It  follows  that  the  discrete-time  ver¬ 
sion  of  the  complex  envelope  of  the  received  waveform  in  the 
bit-interval  [p7j,,  (p  +  l)Tf,],  r(p)  say,  is  an  WM-dimensional1 
vector  expressed  as: 

G-l 

r{p)  =  ^2  Ake3<t,hbk{p)s0k  +  z(p)+w(p)  (1) 

_ fc=0 

1 N  is  the  processing  gain  and  M  is  the  number  of  samples  per 
chip. 


wherein  z(p)  and  w(p)  represent  the  discrete-time  versions 
of  the  MAI  and  of  the  thermal  noise.  We  also  suppose  that 
the  codes  . . . ,  «g-i  3x6  linearly  independent.  The  detec¬ 
tion  structure  that  we  consider  is  depicted  in  figure  1.  The 
filter  D  is  chosen  as  the  solution  to  the  following  constrained 
minimization  problem: 

E  jj|DH/i(p)||''j  =  min  subject  to:  DH  Sc  =  K  (2) 

wherein  Sc  is  an  JVM  x  G-dimensional  matrix  containing  on 
its  columns  the  signatures  from  the  users  to  be  decoded,  and 
K  is  a  G  x  G  matrix,  which  we  assume  to  have  full-rank.  If 
we  let  h(p)  =  r(p)  —  w(p),  the  filter  D  ends  up  coincident 
with  a  decorrelating  group  detector,  and,  if  the  signatures  of 
all  of  the  active  users  are  linearly  independent,  it  zero- forces 
the  MAI.  Under  this  circumstance,  it  can  be  also  shown  that, 
if  we  let  M  =  1  and  consider  a  synchronous  CDMA  system, 
the  receiver  structure  in  figure  1  reduces  to  the  group  detector 
presented  in  [1],  On  the  other  hand,  if  we  let  h(p)  =  r(p),  the 
filter  D  reduces  to  a  group  MMSE  detector.  It  is  also  seen 
that  the  choice  of  the  constraint  matrix  K  has  no  effect  on  the 
system  structure,  so  that  we  can  set  K  equal  to  the  identity 
matrix. 

As  to  the  performance  assessment,  simulation  results  con¬ 
firm  that  the  new  structure  achieves  very  satisfactory  perfor¬ 
mance.  In  keeping  with  the  single-user  receivers  behavior,  the 
MMSE  group  detector  outperforms  its  decorrelating  counter¬ 
part,  especially  for  large  number  of  users  and/or  for  power- 
controlled  systems. 

Finally,  it  is  worth  pointing  out  that  the  new  structures 
may  be  implemented  in  a  blind  adaptive  fashion  through  a 
straightforward  application  of  the  Recursive- Least-Squares  al¬ 
gorithm  or  of  subspace  tracking  algorithms. 


Figure  1:  Block-scheme  of  the  group  detector. 
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Abstract  -  In  this  paper  we  discuss  a  research  effort  focused 
on  the  design  and  analysis  of  robust,  low-complexity, 
adaptive  wireless  receivers  that  exhibit  good  performance 
characteristics  in  the  presence  of  multiple  sources  of 
complex  structured  interference  as  well  as  significant 
uncertainty  regarding  the  exact  structure  of  that  interference. 
We  consider  problems  primarily  related  to  the  code-division, 
multiple-access  (CDMA)  environment.  In  particular,  we 
study  adaptive  one-shot  linear-quadratic  (LQ)  receivers  for 
time-varying,  frequency-selective  CDMA  fading  channels. 
We  propose  a  novel  Bayesian  approach  to  this  problem  in 
which  receivers  are  designed  based  on  a  probabilistic 
channel  model  that  explicitly  incorporates  multiple  sources 
of  additive  interference  as  well  as  a  stochastic  structure  for 
the  channel  uncertainty.  Under  the  assumption  that  the 
probabilistic  structure  of  the  channel  model  is  known,  we 
develop  and  analyze  a  design  strategy  for  adaptive  LQ 
receivers  that  are  optimal  with  respect  to  the  assumed 
channel  model  and  robust  with  respect  to  uncertainty 
regarding  the  true  instantaneous  state  of  the  channel.  In 
addition,  we  develop  and  analyze  an  adaptive  modulation 
scheme  that  works  in  conjunction  with  the  proposed  LQ 
receivers  to  either  maximize  throughput  or  minimize 
probability  of  error. 

For  the  purposes  of  receiver  design  (but  not  performance 
analysis),  we  make  the  simplifying  assumption  that  all 
additive  interference  on  the  channel  is  Gaussian.  If  we  were 
to  restrict  attention  to  linear  detectors  incorporating 
antipodal  signals,  then  it  is  easy  to  show  that  the  proposed 
approach  to  receiver  design  would  lead  directly  to  an 
adaptive  minimum-mean-squared-error  (MMSE)  linear 
detector  analogous  to  the  one  developed  in  [1].  In  this 
respect,  the  proposed  approach  can  be  regarded  as  a 
generalization  of  linear  MMSE  detection  that  is  much  less 
sensitive  to  errors  in  channel  state  estimates.  Under  the 
Gaussian  assumption,  the  optimal  detectors  for  a  fixed  signal 
structure  and  known  estimates  of  the  current  channel  state 
are  necessarily  linear-quadratic.  We  show  that  if  the  second- 
order  structure  of  the  channel  is  known,  then  the  optimal 
detector  for  binary  signals  can  be  determined  by  maximizing 
a  particular  cost  function.  In  addition,  we  show  that  the 
maximum  value  of  the  proposed  LQ  cost  function  for  any 
pair  of  transmitted  signals  is  equivalent  to  the  Kullback- 
Leibler  (KL)  distance  between  the  two  corresponding 
Gaussian  hypotheses.  Hence,  maximizing  the  cost  function 
simultaneously  with  respect  to  both  the  signal  and  detector 
structure  gives  the  optimal  LQ  detector  for  the  signal  pair 


that  maximizes  the  KL  distance  between  the  corresponding 
Gaussian  hypotheses  subject  to  the  known  estimates  of  the 
channel  state  and  the  second-order  channel  structure. 
Furthermore,  the  structure  of  the  cost  function  can  be 
exploited  to  develop  efficient  adaptive  algorithms  for 
simultaneous  signal  selection  and  receiver  design.  Finally, 
this  approach  can  be  extended  straightforwardly  to  M-ary 
signal  constellations  in  order  to  adapt  the  modulation  and 
detector  structure  to  give  minimum  probability  of  error  at  a 
fixed  data  rate  or  maximum  data  rate  at  a  fixed  probability  of 
error.  This  leads  to  adaptive  modulation  schemes  that  are 
analogous  to  those  discussed  in  [2,  3]  but  much  less  sensitive 
to  errors  in  channel  state  estimates. 

As  an  indication  of  the  potential  of  this  approach,  consider 
the  results  of  a  simulation  experiment  presented  in  Figure  1. 


Figure  1 .  MMSE  Linear  Receiver  versus  LQ  Receiver  with 
Adaptive  Binary  Signaling  at  8  dB  SNR 
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Abstract  —  The  problem  of  error-free  filtering  for 
a  discrete-time,  stationary,  singular,  stochastic  pro¬ 
cess  A  —  {A,,}  from  observations  Y  =  {Y„}  in  the  case 
of  dependent  distortions,  i.e.,  where  the  pair  (A,  Y) 
forms  a  jointly  stationary  process  is  considered. 

Let  (A,  Y)  be  a  two-dimentional,  partially  observable, 
discrete- time,  stationary  stochastic  process  where  A  =  {A„} 
is  the  nonobservable  and  Y  =  {  Yn}  the  observable  component. 
We  shall  take  an  interest  in  finding  rather  general  conditions 
under  which  error-free  filtering  of  A„  from  the  observations 
{Yhj  <  n}  is  possible.  In  [1,  2],  such  conditions  were  pointed 
out  for  the  case  where  Y  =  X  +  Z  and  A  and  Z  are  indepen¬ 
dent  stationary  processes.  In  [3],  such  kind  of  conditions  were 
found  for  the  case  of  independent  distortions,  i.e.,  under  the 
assumptions  that,  given  a  sequence  {A„},  the  observations 
{Yn}  are  independent  and 

P{Y„  6  dy  |  A, ,  j  =  0  ±  1, . . .}  =  P{  Y„  €  dy  |  A„} 

and,  moreover,  the  conditional  distribution  P{Yn  €  dy  |  A„} 
does  not  depend  on  n.  Thus,  in  this  case,  the  observations 
{Yn}  can  be  considered  as  an  output  sequence  of  a  stationary 
memoryless  channel  whose  the  input  sequence  is  {An}. 

Here,  we  consider  a  more  general  situation  where,  given  a 
sequence  {An},  the  observations  {Yn}  can  be  dependent.  The 
results  obtained  here  are  rather  closed  to  some  results  of  [4] 
though,  in  contrast  to  [4],  we  consider  a  causal  filtering  and, 
moreover,  the  methods  of  proof  are  absolutely  different. 

Let  us  now  assume  that  the  process  A  being  estimated 
and  the  observable  process  Y  form  a  two-dimentional  jointly 
stationary  stochastic  process.  Moreover,  we  shall  assume  that 
A  is  a  singular  process  with  a  finite  number  of  values  in  a  set 
X  and  Y  takes  values  in  a  measurable  subset  y  of  the  real 
line. 

To  state  the  main  result,  we  introduce  the  notion  of  condi¬ 
tional  regularity  for  a  pair  of  processes  A  =  {A„}  and  Y  = 
{Y„}.  To  do  this,  we  need  in  some  definitions.  Denote  by  X1 
the  set  of  all  infinite  sequences  x  =  (. ..x_i,a:o,si,...),  € 

X,  i  =  0,±1, —  The  set  y 1  is  defined  similarly.  Let 
x  g  X1  be  a  probability  measure  on  y1  (with  cr-algebra 
of  measurable  sets  generated  by  all  cylinder  sets)  which  is  a 
conditional  distribution  of  Y  =  {Yn}  given  A  =  x,  x  6  X1, 
i-e-i  PxO)  =  P y|x=x(')- 

Assume  now  that  the  measures  Px{-),  x  €  X1,  have  the 
following  property:  for  all  B  €  oo,oo)  =  (J^V(— oo,t»] 

(where  JV(— oo,n]  =  <r{Y,  ,  j  <  n}  is  the  (7-algebra  generated 

1  This  work  was  supported  in  part  by  the  Russian  Fundamental 
Research  Foundation  under  Grant  99-01-00828. 


by  values  of  the  process  Y  =  {Yy}  up  to  time  instant  n) 

sup  \nx(AB)  —  /xx(A)/ix(S)|  — *•  0  as  n  -»  -oo 

AG^irC  —  oo,n) 

uniformly  in  x  for  almost  all  x  €  X1  (with  respect  to  the 
measure  generated  by  the  process  A).  In  this  case,  we  shall 
call  that  the  process  Y  =  { Y„}  is  conditionally  regular  with 
respect  to  A  =  {An}. 

Note  that  in  the  case  where  A  and  Y  are  independent, 
the  notions  of  conditional  regularity  and  (usual)  regularity 
coinside. 

Let  x  €  X1 ,  be  a  projection  of  px(*)  on  the  space 

of  values  of  the  random  variable  Yn,  i.e.,  /**(•)  =  Py„|x=x(0 
is  the  conditional  distribution  of  Yn  given  A  =  x.  Finally, 
denote  by  pj(-)  =  Py„|xn=*(-)»  a  €  X,  a  probability  measure 
on  y  which  is  a  conditional  distribution  of  Y, ,  given  A„  = 
x.  In  the  case  of  jointly  stationary  processes  A  and  Y  the 
measure  !/£(•)  does  not  depend  on  n  and  will  be  denoted  by 

t'x(-). 

The  main  result  is  given  by  the  following 

Theorem.  If  Y  is  conditionally  regular  with  respect  to  X, 

/£°0)  =  *'..(•) 
for  any  x  =  (. . .  x_i ,  Xo ,  x\ , . . .),  and 
?*(•)  #  *V(0  f°r 

then  for  any  integers  m  and  n  the  equality 
E[  A„  I YJ^]  =  A„  a.*., 

holds,  i.e.,  the  value  of  A»,  n  =  0,  ±1, . . .,  can  be  reconstruct¬ 
ed  without  error  from  the  observations  YTJ*,  =  {Y,  ,  j  <  m}. 

The  proof  of  this  theorem  can  be  found  in  [5]. 
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Abstract  —  The  purpose  of  this  paper  is  to  de¬ 
scribe  the  extension  of  the  Whittaker-Shannon  sam¬ 
pling  theorem  to  the  case  of  signals  observed  in  the 
presence  of  noise.  We  introduce  a  class  of  signal  re¬ 
covery  methods  being  a  smooth  correction  of  the  car¬ 
dinal  series.  Both  band-limited  and  non  band-limited 
signals  are  considered.  The  weak  and  strong  L 2  con¬ 
sistency  of  the  algorithms  are  established  and  the  rate 
of  convergence  is  investigated. 

I.  Introduction 

The  Whittaker-Shannon  (WS)  sampling-interpolation  the¬ 
orem  is  generally  recognized  as  a  milestone  in  information 
theory,  communication  systems,  signal  processing  as  well  as 
Fourier  analysis  [1].  The  result  may  be  briefly  stated  as  fol¬ 
lows.  Consider  a  class  BL(Cl)  of  band-limited  signals  with 
bandwidth  fi  and  fintie  energy.  The  WS  theorem  says  that 
every  /  g  BL(Cl)  can  be  reconstructed  from  its  discrete  values 
j  =  0,±1,±2,...  by 

OO 

f(t)=  f(jr)smc  {^~{t  —  jT)j  (1) 

j=  —  oo 

provided  that  r  <  7r/Sl,  where  sinc(t)  =  sin(t)/t.  Formula 
(1)  is  frequently  referred  as  the  cardinal  series  or  the  WS  in¬ 
terpolation  scheme.  Many  extensions  of  (1)  have  been  given 
in  the  case  when  some  assumptions  in  the  sampling  theorem 
are  not  satisfied.  In  particular,  truncation,  aliasing,  location 
(jitter),  amplitude  errors  of  the  WS  cardinal  expansion  have 
been  examined.  Furthermore,  generalizations  to  multiple  di¬ 
mensions,  random  signals,  not  necessarily  bandlimited  signals, 
missing  data,  wavelet  subspaces  and  irregular  sampling  have 
been  proposed  [1],  [4],  Relatively  little  attention,  however, 
has  been  given  to  the  problem  of  signal  sampling  in  the  pres¬ 
ence  of  random  noise.  This  issue  has  been  mentioned  a  num¬ 
ber  of  times  in  the  signal  processing  literature,  but  no  al¬ 
gorithms  with  established  convergence  properties  for  a  signal 
reconstruction  from  noisy  data  were  given.  The  rigorous  theo¬ 
retical  threatment  of  this  problem  has  been  studied  in  [2]  and 
[3].  In  all  these  papers  a  particular  class  of  reconstruction 
algorithms  has  been  examined  and  only  band  limited  signals 
have  been  taken  into  account.  In  this  paper  we  study  the 
previously  introduced  algorithms  for  both  band-limited  and 
also  non  band-limited  signals.  We  observe  that  each  particu¬ 
lar  technique  can  have  good  reconstruction  accuracy  and  no 
technique  dominates  universally  over  a  large  class  of  signals. 
Hence  our  principal  goal  in  this  paper  is  to  reconstruct  a  sig¬ 
nal  /(f)  from  the  following  finite  record  of  sampled  and  noisy 
data  yj  =  f(jr)  +  £j,  |j|  <  n,  where  r  is  a  sampling  rate. 

We  examine  two  types  of  estimators  of  f(t)  motivated  by 
the  cardinal  expansion  formula.  The  first  one  is  a  kernel  type 


estimator  with  the  sine  function  being  the  reproducing  ker¬ 
nel  for  BL(Cl).  The  second  class  is  using  an  orthogonality 
property  of  the  sine  function  yielding  an  orthogonal  scries  es¬ 
timate.  We  also  extend  our  theory  to  the  case  of  not  necessary 
band-limited  signals.  Hence  we  show  that  our  estimates  can 
adapt  to  a  larger  class  of  signals.  This  is  carried  out  by  ap¬ 
proximating  non-band-limited  signal  in  L^R)  by  a  sequence 
of  band-limited  functions  with  the  bandwidth  increasing  to 
infinity,  i.e.,  Cl  =  Cln  — >  00  as  n  — *  00.  It  is  worth  noting 
that  allowing  Cl  to  vary  our  construction  can  be  viewed  from 
the  prespective  of  wavelet  interpolation  subspaces  (multires¬ 
olution  theory).  Let  us  note,  however,  that  our  estimation 
algorithms  do  not  interpolate  data;  the  necessary  property  in 
the  presence  of  noise. 

II.  Estimators  and  Results 
The  first  estimator  of  /(t)  is  based  on  the  fact  K{t)  = 
sin(ff't)/7rt  is  a  reproducing  kernel  for  BL(Cl)  provided  that 
Cl  <  Cl'.  Hence  we  obtain  the  following  kernel  estimate 
fn (t)  =  Note  that  /„  €  BL{Cl'), 

hence  fn  lives  in  the  same  space  as  /.  Such  a  property  is 
not  shared  by  ordinary  kernel  estimators.  To  construct  the 
second  estimator  we  observe  that  (1)  can  be  written  in  the 
following  equivalent  form  f(t)  =  Yl'jL-oo  fUT)  si  (0>  where 
it  is  known  that  {sj(t)  —  sinc(7r(f  —  jr)/r),j  —  0,  ±1,...} 
defines  as  orthogonal  and  complete  system  of  functions  for 
BL(Cl),  provided  that  r  <  n/Cl. 

The  above  interpretation  of  the  cardinal  expansion  sug¬ 
gests  the  following  orthogonal  type  estimate  of  /(f)  :  /n(t)  = 
where  y k  =  is  the  weighted 

moving  average  of  {yk}  in  the  neighborhood  of  f(kr).  The 
global  mean  integrated  sequared  error  (MISE)  converges  rates 
for  the  aforementioned  estimators  are  established.  In  partic¬ 
ular  it  is  shown  that  for  /  €  BL(Cl)  we  have  MISE(fn)  = 
0(n~r/(r+1))  and  MISE(fn)  =  0(n~2(2r+1)/(5r+7)),  where 
the  index  r  >  1  defines  the  decay  of  band-limited  signals  at 
±00.  The  rate  of  convergence  for  /„  is  valid  for  all  positive 
weight  sequences  {w*}. 
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Abstract  —  In  this  paper,  the  linear  almost- 
periodically  time- variant  (LAPTV)  filtering  of  gener¬ 
alized  almost-cyclostationary  (GACS)  signals  is  con¬ 
sidered  in  the  fraction-of-time  probability  framework. 
It  is  shown  that  in  general  GACS  signals,  when  pro¬ 
cessed  by  LAPTV  filters,  deliver  output  signals  with 
zero  power. 

I.  Introduction 

Very  recently,  a  class  wider  than  that  of  the  almost- 
cyclostationary  (ACS)  signals  has  been  introduced  and  char¬ 
acterized  [2].  Signals  belonging  to  this  class  are  called  general¬ 
ized  almost-cyclostationary  (GACS)  and  exhibit  multivariate 
statistical  functions  that  are  almost-periodic  functions  of  time 
whose  Fourier  series  expansions  have  coefficients  and  frequen¬ 
cies  that  can  depend  on  the  lag  shifts  of  the  signals.  Moreover, 
the  union  over  all  the  lag  shifts  of  the  lag-dependent  frequency 
sets  is  not  necessarily  countable.  The  GACS  signals  have  been 
characterized  in  [2]  in  both  time  and  frequency  domains  in 
terms  of  generalized  cyclic  statistics. 

In  this  paper,  the  linear  almost-periodically  time-variant 
(LAPTV)  filtering  of  GACS  signals  is  considered  in  the 
fraction-of-time  (FOT)  probability  framework  [1]  in  which 
statistical  parameters  are  defined  through  infinite-time  aver¬ 
ages  of  a  single  time-series  rather  than  ensemble  averages  of  a 
stochastic  process. 

II.  LAPTV  FILTERING  OF  GACS  SIGNALS 

Let  us  consider  the  impulse-response  funcion  of  a  LAPTV 
system 

h(t,u)  =  ha(t  —  u)  ej27rt7U  ,  (1) 

cr£fl 

where  fl  is  the  set  of  the  frequency  shifts  introduced  by  the 
system.  It  can  be  shown  that  the  Nth-order  temporal  moment 
function  (TMF)  of  the  output  y(t ),  that  is,  the  almost-periodic 
component  contained  in  the  Nth-order  lag  product  of  y(t),  is 
given  by 

Jiy(lt  +  t)  =  e^T(lt+x)  2)^(1  t  +  r),  (2) 

<re  nN 


It  can  be  shown  that  if  ha(t )  €  L2(R)  and  x(t)  is  a  GACS 
signal  not  containing  any  ACS  component,  then  ‘Dxo-(It  +  r) 
is  infinitesimal  as  ||r||  — t  oo  and  then,  as  |t|  ->  oo.  Therefore, 
D*,<r(l  t  +  r),  as  function  of  t,  is  a  function  with  zero  power 
so  that  the  product  (It  +  Ti)Dx,<r2  (It  +  r 2)  does  not 

contain  any  additive  sinewave  component.  Thus,  the  TMF 
(2)  of  any  order  N  is  zero  in  the  temporal  mean-square  sense, 
that  is,  the  output  signal  has  zero  power.  Moreover,  it  results 
that  the  output  TMF  can  be  not  identically  zero  only  if  the 
input  time-series  contains  ACS  components  (in  which  case  the 
output  time-series  is  ACS),  unless  some  function  h„„(-)  con¬ 
tains  impulsive  terms,  as  in  the  case  of  systems  introducing 
constant  time  delays  or  frequency  shifts. 

Some  limitations  in  the  applicability  of  (higher-order) 
cyclostationaxity-based  signal  processing  algorithms  arise, 
when  the  increasing  of  the  collect  time  makes  the  GACS 
model  more  appropriate  than  the  ACS  one,  since  possible 
time  variations  of  timing  parameters  of  the  signals  must  be 
taken  into  account.  In  fact,  in  such  a  case,  (generalized) 
cyclic  statistic  estimates  of  the  output  signal  are  asymptot¬ 
ically  zero  when  the  collect  time  approaches  infinity.  There¬ 
fore,  there  exists  an  upper  limit  to  the  maximum  usable  collect 
time  and,  consequently,  there  exists  a  limit  to  the  minimum 
acceptable  signal-to-noise  ratio  for  cyclostationarity-based  al¬ 
gorithms  which  are,  in  principle,  under  mild  assumptions,  in¬ 
trinsically  immune  to  the  effects  of  noise  and  interference,  pro¬ 
vided  that  the  collect  time  approaches  infinity. 

The  identically  zero  (generalized)  cyclic  statistics  of  the 
LAPTV  filtered  GACS  signals  are  consequence  of  the  proper¬ 
ties  of  the  single  observed  time-series  (e.g.,  the  possible  time 
variation  of  a  timing  parameter,  such  as  the  carrier  frequency 
or  the  baud  rate).  In  contrast  to  this,  statistic  functions  of 
a  stochastic  process  can  be  identically  zero  as  a  consequence 
of  the  presence,  in  the  stochastic  process  model,  of  a  ran¬ 
dom  parameter  whose  effect  is  to  make  the  statistical  expec¬ 
tations  equal  to  zero.  In  such  a  case,  however,  in  general  the 
stochastic  process  is  not  ergodic.  Therefore,  the  FOT  prob¬ 
ability  framework  is  very  actractive  due  to  the  equivalence 
between  theoretical  statistical  functions  and  their  asymptotic 
estimates. 


where 


References 


2Wi t+r)=  f  <^>dA. 

'/K'Vn=l 

(3) 


In  (2)  and  (3),  1  =  [1,  •  *  - ,  1]T,  Ha(f)  is  the  Fourier  trans¬ 
form  of  ha(t),  and  §X(A)  is  the  Nth-order  spectral  moment 
function  of  the  input  signal  x(t),  that  is,  the  N-fold  Fourier 
transform  of  the  Nth-order  TMF. 
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Abstract  —  We  explore  in  this  paper  the  lattice 
sphere  packing  representation  of  a  multi-antenna  sys¬ 
tem  and  the  algebraic  space-time  (ST)  codes.  We 
apply  the  sphere  decoding  (SD)  algorithm  to  the  re¬ 
sulted  lattice  code.  For  the  uncoded  system,  SD 
yields,  with  small  increase  in  complexity,  a  huge  im¬ 
provement  over  the  well-known  V-BLAST  detection 
algorithm.  SD  of  algebraic  ST  codes  exploits  the 
full  diversity  of  the  coded  multi-antenna  system,  and 
makes  the  proposed  scheme  very  appealing  to  take  ad¬ 
vantage  of  the  richness  of  the  multi-antenna  environ¬ 
ment.  The  fact  that  the  SD  does  not  depend  on  the 
constellation  size,  gives  rise  to  systems  with  very  high 
spectral  efficiency,  maximum  likelihood  (ML)  perfor¬ 
mance,  and  low  decoding  complexity. 

I.  Introduction 

Recently,  the  field  of  multi-antenna  processing  and  space-time 
(ST)  coding  has  attracted  large  interest  in  the  communication 
community  due  to  the  huge  capacity  of  the  multi-antenna  envi¬ 
ronment  [1].  Because  of  the  maximum  likelihood  (ML)  detec¬ 
tion  high  complexity  sub-optimal  detection  like  the  V-BLAST 
have  been  proposed  for  the  uncoded  system  [2]. 

In  this  paper,  we  prove  that  one  can  reach  the  ML  perfor¬ 
mance  of  the  uncoded  system  with  low  complexity  by  applying 
the  sphere  decoder  [3]  on  the  lattice  sphere  packing  represen¬ 
tation  of  a  multi-antenna  system.  Moreover,  it  is  shown  that 
one  can  achieve  the  full  diversity  of  the  multi-antenna  system 
by  using  full  spatial  diversity  rotated  constellations  without 
adding  redundancy  [4],  and  still  reach  the  ML  performance 
with  reasonable  complexity. 

II.  Simulation  Results 

In  simulations  we  use  the  constellation  g-QAM,  with  q  =  4,  16. 
The  average  energy  per  bit  is  fixed  to  Eb  =  1.  We  consider 
a  multi-antenna  system  with  M  transmitters  and  N  —  M 
receivers.  The  algebraic  coding  is  done  over  l  periods  by  using 
rotated  constellations  of  dimension  Ml.  The  channel  transfer 
matrix  is  modeled  by  independent  complex  Gaussian  random 
variables  of  variance  0.5  per  real  dimension.  The  curves  are 
plotted  as  a  function  of  SNR  (the  signal-to-noise  ratio  per  bit), 
and  the  variance  a2  of  the  complex  AWGN  per  real  dimension 
is  adjusted  by  the  formula  a2  =  10~SN Rl10 1  where  Es 

is  the  average  symbol  energy  of  the  q-QAM  when  Eb  —  1  and 
equals  .  In  figures  1  and  2  we  applied  the  SD  on  both 

uncoded  data  streams  and  algebraic  ST  codes  over  l  periods 
with  M  =  N  transmit/receive  antennas  [4].  Comparisons  are 
done  with  the  V-BLAST  detection  algorithm  [2].  It  is  shown 
that  at  the  expense  of  a  moderate  increase  in  complexity,  a 
huge  improvement  in  performance  is  achieved. 


Fig.  1:  SD  of  V-BLAST  architecture,  M  =  N  =8,  average  symbol 
error  rate  of  the  16-QAM  modulation,  32  bits/s/Hz. 


Fig.  2:  SD  of  algebraic  ST  codes,  M  =  N  =4,  average  symbol 

error  rate  of  the  4-QAM  modulation,  8  bits/s/Hz,  /  =  1,2,3. 
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Abstract  —  The  computational  cut-off  rate 
is  studied  for  the  complex  Rayleigh  flat  fad¬ 
ing  spatio-temporal  channel  under  a  peak  power 
constraint.  Any  optimal  finite  constellation  of 
signals  must  admit  an  equalizer  distribution 
which  equalizes  the  conditional  decoding  error 
probabilities.  For  small  constellations  the  set  of 
equiprobable  mutually-orthogonal  unitary  ma¬ 
trices  attains  cut-off  rate.  For  low  SNR  these 
matrices  are  rank  one  and  a  single  transmit  an¬ 
tenna  is  as  good  as  multiple  antennas. 

I.  Introduction 

There  are  M  transmitter  antennas  and  N  receiver 
antennas  and  the  MN  channel  fading  coefficients  are 
i.i.d.  constant  complex  Gaussian  over  the  coherent  fade 
interval  of  length  T  time  periods  [2].  While  the  SNR 
r]  is  known  the  fading  coefficients  are  unknown  to  both 
transmitter  and  receiver.  In  a  frequency  hop  system 
each  coherent  fade  interval  corresponds  to  a  different 
frequency  band.  A  baseband  transmitted  signal  Si  is 
aT  x  M  matrix  having  complex  valued  entries.  When 
the  Si's  are  drawn  from  a  constellation  of  A- possible 
signals  the  signalling  rate  is  K/T  symbols/sec/hz.  The 
computational  cut-off  rate  specifies  the  maximum  prac¬ 
tical  rate  that  can  be  supported  by  the  channel  and  is 
often  simpler  to  calculate  than  channel  capacity.  For 
more  details  on  the  following  results  see  [1]. 


II.  Cutoff  Rate  Representations 

For  tiny  A-dirnensional  constellation  {Si, . . . ,  Sk} 
define  the  K  x  K  dissimilarity  matrix  Ek  = 
((e-ND<s‘W))*.=1  where 

Hef  ,  |/T+?(SiS.»+SjS?)|2 

is  a  pairwise  distance  function  between  signal  matrices 
in  the  constellation. 


Proposition  1  [1]  Let  R0{K)  denote  the  peak  power 
constrained  cut-off  rate  restricted  to  K -dimensional 
constellations.  Then 

Ro(K)  =  log  <  max  max  1  TEZl\k 

1This  research  was  performed,  in  part,  while  the  first  au¬ 
thor  was  visiting  the  Mathematical  Sciences  Research  Cen¬ 
ter,  Bell  Laboratories,  Lucent  Technologies 
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where  <S^ak  is  the  set  of  K -dimensional  peak  con¬ 
strained  constellations  for  which  there  exists  an  “equal¬ 
izer  probability  vector”  P_K  satisfying  EkE.k  =  c-k  for 
some  c  >  0. 


III.  Optimality  of  Unitary  Orthogonal 
Matrices 


For  given  77,  T  and  M  define  the  integer  M0 

(  ,  (l  +  T}TM/(2m))2} 

Mo=  argrnaxme{1 . M}  [rn log  J~  (1) 


and 


def 


n  _ 

■'--'max  — 


max  D(Si\\S2)  =  M0  log 


■Si,s2es£ak 


(1  +  i]T  M  /  (2M0))2 
1  +  r)TM/M0 


Dm  ax  is  the  maximum  value  of  the  minimum  distance 
achievable  by  any  constellation  of  dimension  K  < 
T/M0. 


Proposition  2  [1]  Let  2 M  <  T  and  let  M0  be  as  de¬ 
fined  in  (1).  Suppose  that  M0  <  min {M,T/K}.  Then 

R0{K)  =  log 

and  Dmax  is  given  by  (2).  Furthermore,  the  optimal 
constellation  attaining  R0(K)  is  the  set  of  K  equiprob¬ 
able  rank  M0  mutually  orthogonal  unitary  matrices: 
Sff  Si  =  I m0  and  Sf1  Sj  =0,  i  ^  j. 

The  rank  M0  of  the  matrices  in  the  optimal  con¬ 
stellation  increases  from  1  to  M  as  the  SNR  parameter 
rfTM  increases  from  0  to  00.  If  SNR  is  sufficiently 
large,  e.g.  for  M  =  6  and  T  >  12  if  rfTM  >  17,  then 
M0  =  M  and  the  optimal  signal  matrices  utilize  all  M 
transmit  antennas.  On  the  other  hand  for  small  SNR, 
e.g.  rfTM  <  4,  then  M0  =  1  and  the  optimal  signal 
matrices  apply  all  available  transmit  power  to  a  single 
antenna  element  over  each  coherent  fade  interval. 
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Abstract  —  In  this  paper,  we  study  the  capacity  of 
multi-antenna  channel  where  the  fading  coefficients 
are  unknown  to  either  transmitter  or  receiver.  The 
high  SNR  channel  capacity  is  computed,  and  a  geo¬ 
metric  interpretation  of  sphere  packing  in  Grassmann 
manifold  is  given. 

I.  Introduction 

Recent  research  has  shown  that  by  using  multiple  antennas 
at  both  the  transmitter  and  the  receiver,  the  spatial  diversity 
provides  much  larger  spectral  efficiency  than  the  conventional 
channels.  In  contrast  of  the  single  antenna  AWGN  channel, 
where  1  bps/Hz  capacity  gain  can  be  achieved  with  every  3 dB 
increase  in  SNR,  in  a  channel  with  N  transmit  and  N  receive 
antennas,  it  is  shown  that  the  corresponding  capacity  is  gain 
is  N  bps/Hz.  [1] 

The  result  above  is  derived  under  the  key  assumption  that 
the  instantaneous  fading  coefficients  are  known  or  precisely 
estimated  at  the  receiver  end.  In  practical  applications,  espe¬ 
cial  mobile  systems,  the  fading  coefficients  can  change  quite 
rapidly  and  the  precise  estimation  of  the  channel  parameters 
becomes  difficult.  In  this  paper,  we  will  study  the  channel 
capacity  with  no  assumption  on  the  prior  knowledge  of  the 
fading  coefficients  to  understand  the  fundamental  limit  of  non¬ 
coherent  multi-antenna  communications. 

II.  System  Model 

We  will  use  the  same  model  given  in  [2].  Assume  the  system 
has  N  transmit  and  N  receive  antennas.  The  propagation 
coefficients  between  all  antenna  pairs  form  a  N  x  N  random 
matrix  H  with  iid  CN( 0,1)  entries.  H  is  unknown  to  the 
transmitter  and  receiver.  To  approximate  the  continuously 
varying  coefficients,  we  assume  that  H  remains  constant  for  T 
symbol  periods,  and  change  to  new  independent  realizations 
afterwards.  The  time  period  that  H  remains  constant  will 
be  referred  as  coherence  interval,  and  T  referred  as  coherence 
time.  The  channel  in  each  coherence  interval  can  thus  be 
written  as 

Y  =  HX  + W 

where  X,  Y  €  CNxT  are  the  transmitted  and  received  signals, 
respectively.  W  €  CNxT  is  the  additive  white  Gaussian  noise. 
The  SNR  at  each  receive  antenna  is  denoted  as  p. 

The  goal  of  this  paper  is  to  compute  the  capacity  of  this 
channel  at  high  SNR,  p  — ►  oo.  In  [2],  it  is  shown  that  increas¬ 
ing  the  number  of  antennas  N  beyond  T  provides  no  capacity 
gain.  Therefore  in  this  paper,  we  will  only  consider  the  case 
where  N  <T. 

'  This  research  is  supported  by  a  National  Science  Foundation 
Early  Faculty  CAREER  Award  and  by  DARPA  grant  F30602-97- 
2-0346. 


III.  Channel  Capacity 

To  approach  this  problem,  we  need  to  introduce  the  following 
new  coordinate  system.  A  N  x  T  matrix  R  with  N  <T,  can 
be  represented  as  the  N  dimensional  subspace  Qr  spanned 
by  the  row  vectors,  together  with  a  N  x  N  matrix  Cr  which 
specifies  the  N  row  vectors  of  R  with  respect  to  a  prescribed 
basis  in  f Ir.  The  transformation 

R  — ►  {Cr,  CIr) 

is  a  change  of  coordinate  system:  CNxT  —*  CNxN  x  G(T,  N). 
Here  G(T,  N)  is  the  Grassmann  manifold  defined  as  the  set  of 
all  N  dimensional  subspaces  of  CT. 

The  motivation  of  using  this  new  coordinate  system  is  that 
the  transmitted  subspace  is  not  corrupted  by  the  fading  coeffi¬ 
cients,  finx  =  Ox-  Therefore, the  new  coordinates  decompose 
CNxT  into  the  directions  that  affected  by  both  fading  and  ad¬ 
ditive  noise  and  those  directions  affected  by  noise  alone.  In 
this  new  coordinate  system,  the  relevant  differential  entropies 
can  be  computed,  and  the  optimization  problem  can  be  solved 
more  easily. 

Theorem  1  For  system  with  N  transmit  and  receive  anten¬ 
nas,  if  the  coherence  time  T  >  2 N,  the  high  SNR  channel 
capacity  (bps/Hz)  is  given  by 

C(p)  =  —  log2  |G(T,  N)\  +  (1  —  ^)it[log2  de^HH*)] 

+Ar(l  -  jt)  l°g2  Nve  +  °(!) 

where  \G(T,  A'’)!  is  the  volume  of  Grassmann  manifold 
G(T,  N),  and  E[ logdetHH*]  =  £f=1£log^  for  X\>  chi- 
square  distributed  with  dimension  2 i. 

This  capacity  is  asymptotically  achieved  by  the  constant  equal 
norm  input  P( A  =  %/T/jv)  =  1.  Under  this  input,  one  can 
show  that  all  the  mutual  information  is  carried  by  the  random 
subspace  fix,  which  lies  in  the  Grassmann  manifold  G(T,N) 
with  dimension  N(T—N).  Therefore,  the  number  of  degrees  of 
freedom  available  to  communicate  non-coherently  is  N(T—N) 
per  T  symbol  time.  The  capacity  gains  N  (1  —  N /T)  bps/Hz 
for  each  3 dB  SNR  increase. 

The  capacity  result  above  can  also  be  interpreted  as  sphere 
packing  in  Grassmann  manifold. 
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Abstract  —  In  this  paper,  we  design  multiple- 
antenna  signal  constellations  for  a  Rayleigh  flat-fading 
channel  unknown  to  the  receiver.  It  is  shown  that 
good  signal  constellations  correspond  to  packings  with 
large  minimum  distances  in  complex  Grassmannian 
space.  We  describe  a  numerical  optimization  proce¬ 
dure  for  finding  such  packings.  The  corresponding 
signal  constellations  improve  significantly  upon  pre¬ 
viously  best-known  signal  constellations. 

I.  Introduction 

This  paper  concerns  with  the  design  of  unitary  space-time 
constellations  [2]  for  an  M  transmit-antenna  communication 
system  operating  in  a  Rayleigh  flat-fading  environment.  As¬ 
sume  that  the  fading  coefficients  among  different  pairs  of 
transmit  and  receive  antennas  are  statistically  independent 
and  are  unknown  to  the  receiver.  The  fading  coefficients  re¬ 
main  constant  for  a  coherence  interval  of  T  symbol  periods, 
and  then  change  simultaneously  to  independent  realizations. 
A  unitary  space-time  constellation  of  cardinality  L  is  a  collec¬ 
tion  of  L  complex  orthonormal  matrices  of  size  T  x  M,  where 
the  i-th  column  of  each  matrix  contains  symbols  transmitted 
over  a  coherence  interval  through  the  t-th  transmit  antenna. 

We  show  that  for  a  small  number  of  transmit  antennas, 
the  pairwise  probability  of  error  between  two  distinct  sig¬ 
nal  points,  f>i  and  $2,  of  a  unitary  space-time  constellation 
is  related  to  the  correlation  ($J$2,$*$2),  where  (A,  B)  = 
k  AjkB‘k.  Thus,  the  maximum  correlation  between  two 
distinct  signal  points  can  be  used  as  a  figure  of  merit  and  the 
problem  of  finding  good  unitary  space-time  constellations  can 
be  stated  as  follows: 

Unitary  space-time  constellation  design  problem: 
Given  natural  numbers  T,  M,  and  L,  with  M  <T,  find  a  col¬ 
lection  5  =  {$1,  $2,  •  •  •  ,  $1,}  of  T  x  M  complex  orthonormal 
matrices  such  that  the  maximum  correlation,  given  by 

<r*{S)~  max  ($?$,-,  $;$,•)  (1) 

is  minimized,  o 

II.  Complex  Grassmannian  Space 
The  complex  Grassmannian  space  Q(T,M,  C)  is  the  set  of 
all  Af-dimensional  subspaces  of  CT.  Let  4>i, $2  be  two 
T  x  M  complex  orthonormal  matrices  whose  column  spaces 
are  Pi,  P2  6  Q(T,M,  C)  respectively.  The  squared  distance 
between  Pi  and  P2  can  be  defined  as 

Aa.  ft)  =  r 

We  refer  to  a  finite  subset  of  the  complex  Grassmannian  space 
G(T,M,  C)  as  a  packing  in  Q(T,M,  C)  The  squared  minimum 
distance  d2(S)  of  a  packing  5  is  given  by 

d2(S)  =  pmjnsd2 (Pi,  P,)  (2) 

p'Jpj 


EVom  (1)  and  (2),  it  follows  that  the  problem  of  design¬ 
ing  good  unitary  space-time  constellations  is  the  same  as  the 
problem  of  finding  packings  in  complex  Grassmannian  space 
that  have  large  minimum  distances. 

III.  Optimization  Technique 

Since  the  complex  Grassmannian  space  G(T,  M,  C)  is  a  dif¬ 
ferentiable  manifold,  parameters  involved  in  the  optimization 
problem  given  by  (1)  lie  in  a  differential  manifold.  Thus,  one 
can  consider  the  direct  minimization  of  <r*(S)  using  gradient 
search  algorithms.  Unfortunately,  a’  (S)  has  many  local  min¬ 
ima  that  are  far  away  from  the  global  minima.  Moreover, 
tr*(S)  is  not  very  smooth— in  fact,  it  is  not  even  differentiable 
everywhere. 

In  order  to  circumvent  these  difficulties,  we  introduce  a 
family  of  potential  functions  fa(S)  with  the  following  proper¬ 
ties:  For  all  a  the  functional  fa  is  smooth;  for  small  values  of 
a  the  functional  fa  has  few  local  minima;  the  functionals  fa 
mimic  a*,  as  a  — »  00. 

The  search  procedure  starts  with  a  relatively  small  value 
of  a,  say  «o,  and  a  randomly  generated  unitary  space-time 
constellation  S.  It  uses  numerical  optimization  techniques  to 
find  a  set  Sao  such  that  the  value  of  fao  is  (nearly)  locally 
minimized.  Next,  we  slightly  increase  a  to  ati  and  starting 
from  the  set  Sao,  find  a  new  set  Sai  that  (nearly)  locally  min¬ 
imizes  fai  ■  We  continue  in  this  manner,  each  time  increasing 
the  value  of  a  slightly  and  tracking  the  minimizer  of  fa.  For 
very  large  values  of  a,  fa  would  be  essentially  equivalent  to 
a *  and  minimizing  fa  will  also  essentially  minimize  a* . 

IV.  Results 

Using  the  numerical  technique  described  above,  we  gener¬ 
ate  unitary  space-time  constellations  of  cardinality  2T  for 
M  —  1, 2, 3  and  T  =  5, 6, . . . ,  10  [1].  In  all  cases,  these  constel¬ 
lations  improved  upon  previously  best  known  constellations 
[3].  The  unitary  space-time  constellations  generated  here  can 
be  used  as  a  benchmark  to  assess  the  optimality  of  other  sig¬ 
nal  constellations  designed  subjected  to  constraints  such  as 
the  ability  to  ‘easily  encode  and  decode’. 
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SUMMARY 

As  new  classes  of  iteratively  decodable  very 
long  block-length  codes  are  discovered,  they 
appear  to  have  the  potential  of  achieving 
extremely  low  error  probabilities  close  to  the 
Shannon  limit.  While  the  degree  to  which  the 
suboptimal  iterative  decoding  process 
degrades  this  performance  is  not  yet  well 
determined,  progress  has  occurred  in 
establishing  the  optimal  maximum  likelihood 
performance. 

Following  the  approaches  of  [1]  and  [2],  we 
obtain  that  for  arbitrarily  long  block  lengths 
on  a  Gaussian  channel,  vanishingly  low  error 
probabilities  can  be  achieved  by  maximum 
likelihood  decoding  provided 

r(d)<K(Es/No,d)  for  0<d<  1 

where  r(d)  is  the  code  “rate-weight  function”, 
which  is  the  normalized  logarithm  of  the 
ensemble  average  number  of  codewords  of 
normalized  distance  d,  and  K(Es/No,d)  is  a 
function  only  of  the  channel  parameters.  We 
have  evaluated  K(  )  for  the  SNR  range  of 
interest.  For  certain  serial-concatenated  and 
accumulated-convolutional  codes  [3],  r(d) 
has  been  approximated,  from  which  the 
closeness  of  the  code’s  performance  to  the 
Shannon  limit  can  be  estimated. 
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I.  Introduction 

Although  turbo  codes  with  iterative  decoding  have  been 
shown  to  achieve  bit-error  rates  (BER’s)  close  to  the  Shannon 
limit,  they  suffer  from  three  disadvantages:  a  large  decoding 
delay,  an  error  floor  at  low  BER’s,  and  a  relatively  poor  frame 
error  performance  (FER) .  This  paper  presents  an  interactive 
concatenated  turbo  coding  system  in  which  a  Reed-Solomon 
outer  code  is  concatenated  with  a  binary  turbo  inner  code.  In 
the  proposed  system,  the  outer  code  decoder  and  the  inner 
turbo  code  decoder  interact  to  achieve  both  good  bit  error 
and  frame  error  performances.  Also  presented  in  the  paper 
are  an  effective  criterion  for  stopping  the  iterative  decoding 
process  and  a  new  reliability-based  decoding  algorithm  called 
Chase-  GMD  algorithm  for  nonbinary  codes. 

II.  The  Chase-GMD  Decoding  Algorithm 

The  Chase-GMD  decoding  algorithm  is  a  reliability-based  de¬ 
coding  algorithm  which  combines  Chase-2  and  GMD  algo¬ 
rithms.  Consider  an  ( n0,k0,d )  RS  code  over  GF(g)  with 
q  —  2m.  Let  y  be  the  received  sequence  and  z  be  the  hard- 
decision  received  sequence.  Without  loss  of  generality,  we  as¬ 
sume  that  the  hard-decision  received  symbols  in  z  are  ordered 
in  the  order  of  increasing  reliability.  We  also  assume  that  an 
error-and-erasure  algebraic  decoder  is  used  to  generate  can¬ 
didate  codewords.  For  0  <  P  <  |_d/2j ,  let  E  denote  the 
set  of  test  error  patterns  with  errors  (nonzero  components) 
confined  to  the  P  least  reliable  positions.  Let  A,(q)  denote 
the  set  of  q'  <  q  most  probable  symbols  in  GF(q)  at  the  i- 
th  symbol  position,  0  <  i  <  P.  The  error  at  the  i-th  po¬ 
sition  of  E  is  chosen  from  Ai(q').  Let  CGA(P, q')  denote 
the  Chaise- GMD  algorithm  with  parameter  P  and  q  .  This 
CGA(P,  q')  processes  all  the  vectors  w  =  z  +  e  with  e  in  E. 
Let  /(P)  —  {i  :  0  <  i  <  d  —  2 P  —  1  and  d  —  i  is  odd}. 

For  each  w  and  each  integer  i  €  J(P),  erase  i  symbols  of  w 
starting  from  symbol  position  P  +  1  to  symbol  position  P  +  i. 
This  results  in  a  vector  w*  with  i  erasures.  Perform  error- 
and-erasure  decoding  on  w*.  If  decoding  is  successful,  the 
decoded  codeword  is  a  candidate  codeword.  After  performing 
q'p([(d  +  l)/2j  —  P)  decodings,  we  obtain  a  set  of  candidate 
codewords.  Among  these  candidate  codewords,  the  one  with 
the  best  metric  is  the  decoded  codeword.  The  performance  of 
CGA(P,  q')  improves  as  P  increases. 

III.  A  Concatenated  Turbo  Coding  System 

To  construct  a  concatenated  turbo  coding  system,  a  turbo 
code  with  two  block  component  codes  is  chosen  sis  the  inner 
code,  and  an  ( n0,k0,d )  RS  code  over  GF(2m)  is  chosen  as 
the  outer  code.  At  the  decoder,  the  received  sequence  is  first 
turbo  decoded  in  parallel  mode[l],  i.e.,  two  component  code 
decoders  operate  simultaneously.  At  the  each  phase  of  a  de¬ 
coding  iteration,  two  decoders  produce  two  decoded  informa¬ 
tion  sequences,  each  segmented  into  A  vectors  with  n0  symbols 

1This  research  was  supported  by  NSF  under  Grants  NCR  94- 
15374,  OCR  97-32959,  CCR  98-14054  and  NASA  under  Grants 
NAG  5-931  and  NAG’  5-8414. 


over  GF(2m).  Then,  compare  each  pair  of  corresponding  vec¬ 
tors  from  the  two  turbo  decoders,  and  check  how  many  sym¬ 
bol  positions  where  two  corresponding  symbols  do  not  match. 
If  the  number  of  mismatched  symbol  positions  for  each  pair 
of  corresponding  n0-vectors  is  less  than  or  equal  to  the  error 
correcting  capability  of  the  outer  RS  code,  —  1)/2J,  we 
stop  the  inner  turbo  decoding  iteration  and  let  the  outer  code 
decoder  with  algebraic  decoding  to  take  over  and  complete 
the  decoding  process.  This  new  stopping  criterion  for  inner 
turbo  decoding  is  called  symbol  matching  (SM)  criterion.  It 
saves  more  decoding  iterations  and  requires  much  less  compu¬ 
tational  complexity  than  the  cross-entropy  (CE)  criterion  in 
[2].  If  the  outer  code  decoding  is  not  successful  (decoding  fail¬ 
ure),  the  outer  code  decoder  instructs  the  inner  turbo  decoder 
to  continue  its  decoding  iterations  from  the  phase  where  it  was 
stopped  until  the  symbol  errors  at  the  input  of  the  outer  de¬ 
coder  is  reduced  within  the  error  correction  capability  of  the 
outer  code.  The  interactive  process  continues  until  either  the 
outer  decoding  is  successful  or  a  preset  maximum  number  of 
decoding  iterations  for  the  inner  turbo  decoder  is  reached.  In 
the  latter  case,  the  outer  code  decoder  computes  the  reliability 
values  of  its  input  symbols  based  on  the  soft  output  informa¬ 
tion  (log-likehood  ratios  of  the  decoded  bits)  of  inner  turbo 
code  decoder  and  carries  out  the  reliability-based  CGA(p,  q) 
decoding. 

IV.  Simulation  Results 

Consider  a  concatenated  turbo  coding  system  in  which  the 
(228,  212)  shortened  RS  code  over  GF(28)  is  used  as  the  outer 
code  and  the  (64,57)  distance-4  extended  Hamming  code  is 
used  as  the  two  component  codes  for  constructing  the  inner 
turbo  code.  The  rate  of  this  system  is  77=0.75.  The  bit- 
error  and  frame-error  performances  of  this  system  for  AWGN 
channel  are  shown  below.  We  see  the  waterfall  performance 
without  error  floor.  It  is  1.3  dB  away  from  Shannon  limit. 
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Abstract  —  In  a  parallel  concatenated  convolutional 
code  (pccc)  an  information  word  is  encoded  by  a  first 
convolutional  encoder  and  an  interleaved  version  of 
the  information  word  is  encoded  by  a  second  convolu¬ 
tional  encoder.  We  discuss  the  situation  in  which  we 
require  that  both  convolutional  encoders  end  in  the  all 
zero  state.  To  do  so,  we  have  to  split  the  information 
word  in  two  parts.  One  part  containing  information 
bits,  and  a  second  part  containing  bits,  called  tail- 
bits,  computed  such  that  both  encoders  end  in  the 
all  zero  state,  which' we  call  simultaneous  zero-tailing. 
Depending  on  the  structure  of  the  interleaver,  differ¬ 
ent  number  of  tail-bits  are  needed.  By  using  a  con¬ 
structive  method  we  characterize  all  interleavers  for 
a  prescribed  number  of  tail-bits.  We  explain  meth¬ 
ods  of  encoding.  In  addition  simulations  have  been 
carried  out  to  investigate  the  performance  of  simul¬ 
taneous  zero-tailing.  This  shows  that  simultaneous 
zero-tailing  is  similar  in  performance  compared  with 
previously  known  zero-tailing  methods  and  that  it  is 
better  than  zero-tailing  just  one  of  the  encoders. 

I.  Mathematical  Characterization 

We  know  that  the  joint  end  state  [Si ,  S2]  of  the  two  encoders 
is  a  linear  function  of  the  information  word  I.  This  linear 
function  depends  on  the  interleaver  n  used.  Moreover,  the 
dimension  of  the  space  of  end  states  is  between  k  (fully  simul¬ 
taneous  zero-tailing)  and  2k  (fully  independent  zero-tailing), 
where  k  is  the  memory  of  both  encoders.  We  have  derived  a 
characterization  of  interleavers  with  which  simultaneous  zero¬ 
tailing  is  possible.  Our  method,  being  more  general,  gives  a 
larger  class  of  interleavers  than  discussed  in  literature  of  si¬ 
multaneous  zero-tailing  interleavers  so  far  [1,  2], 

Using  the  mathematical  characterization  we  have  devel¬ 
oped  in  this  work,  it  is  possible  to  design  an  interleaver  such 
that  the  dimension  of  the  space  of  end  states  is  a  given  num¬ 
ber  k  +  s  for  any  s  £  {0..fc}.  This  characterization  can  be 
used  in  several  ways.  Given  an  interleaving  permutation  one 
can  compute  the  number  of  zero-tailing  bits  that  are  neces¬ 
sary  for  simultaneous  zero-tailing.  Conversely,  the  character¬ 
ization  allows  counting  and  construction  of  interleavers  with 
a  prescribed  number  of  tail-bits.  This  construction  can  be 
augmented  to  look  for  interleavers  with  large  spread  [3]. 

II.  Simulation  Results 

We  performed  experiments  in  order  to  answer  the  following 
questions. 

1.  How  does  zero-tailing  both  encoders  compare  to  only 
one  encoder  being  zero-tailed? 

2.  How  does  the  simultaneous  zero-tailing  compare  in  per¬ 
formance  to  the  zero-tailing  strategy  proposed  for  the 
UMTS  standard  (both  encoders  of  the  pccc  axe  zero¬ 
tailed  separately)? 


Observe  that  the  number  of  additional  bits  sent  through  the 
channel  (tail-bits  and  their  corresponding  parity  bits)  amounts 
to  4/c  for  independent  zero-tailing  of  both  encoders  (such  as  in 
UMTS),  whereas  our  proposal  requires  between  3k  and  6 k  bits 
depending  on  the  interleaver  used.  We  use  a  pccc  with  8-state 
constituent  encoders  and  an  interleaver  of  size  150  and  spread 
4  for  the  simulations.  The  four  schemes  compared  are  (1)  both 
encoders  are  truncated,  (2)  one  encoder  is  zero-tailed  and  the 
other  is  truncated,  (3)  both  encoders  are  zero-tailed  using  the 
zero-tailing  strategy  proposed  for  the  UMTS  standard  and  (4) 
both  encoders  are  zero-tailed  using  a  simultaneous  zero- tailing 
interleaver  leading  to  3k  additional  bits. 


Figure  1:  WER  for  the  four  schemes  discussed  for  pccc 
with  8-state  constituent  codes  and  interleaver  size  150, 
spread  4. 


We  have  plotted  the  WER  corresponding  to  the  above  four 
experiments  for  different  Signal  to  Noise  Ratios  (Figure  1). 
Simulations  show  that  our  method  of  zero-tailing  is  compara¬ 
ble  in  performance  to  previously  known  zero-tailing  techniques 
and  better  than  termination  strategies  in  which  only  one  en¬ 
coder  is  zero-tailed. 
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Abstract  —  In  this  paper,  we  present  a  simple  method 
using  a  single-error  correcting  BCH  outer  code  in  order  to 
improve  the  performance  of  turbo  codes.  It  is  shown  that 
this  method  reduces  the  errors  dramatically  at  moderate-to- 
high  Et/N0. 

I.  Introduction 

Narayanan  and  Stuber  proposed  a  selective  serial 
concatenation  of  turbo  codes  using  a  double-error  correcting 
BCH  outer  code  to  protect  the  non-zero  bit  positions  in  the 
weight  2  inputs  generating  many  low-weight  codewords  [1]. 
In  this  paper,  we  present  a  new  method,  using  a  single-error 
correcting  BCH  outer  code,  to  protect  most  of  the  non-zero  bit 
positions  corresponding  to  low-weight  codewords.  We  show 
that  this  method  has  a  simpler  decoder,  better  performance 
and  a  smaller  loss  of  code  rate  compared  to  the  scheme  of  [1]. 

n .  Turbo  Codes  with  a  Single-error  BCH  Outer 
Code 

The  low-weight  inputs  generating  low-weight  codewords 
for  a  (7,  5)  component  code  and  a  512  random  interleaver  are 
listed  in  Table  1.  The  errors  of  one  frame  occur 
simultaneously  in  the  non-zero  bit  positions  associated  with 
low-weight  codewords  [2],  So,  if  we  select  each  one  non-zero 
bit  position  from  each  information  words  associated  with 
low-weight  codewords,  the  number  of  the  error  occurring  in 
the  selected  bit  positions  is  mostly  one  at  moderate-to-high 
E\,/N0.  For  example,  15  bit  positions  are  identified  in  each 
group  in  Table  1  (e.g.  134, 165, 179,  474, 9, 16,  94, 112, 171,  214, 
224,  243,  88, 130),  and  then  11  information  bits  are  encoded  by 
a  (15,  11)  single-error  correcting  BCH  code.  The  15  bits  from 
the  BCH  code  are  interposed  in  the  above  15  bit  positions. 
Finally,  the  encoder  encodes  the  overall  information  frame 
using  the  turbo  encoder  and  transmits  the  codeword  through 
channel.  If  the  bit  errors  are  not  corrected  by  the  iterative 
decoding  of  turbo  codes,  most  of  them  are  errors  of  the  non¬ 
zero  bit  positions  in  Table  1  at  moderate-to-high  Eb/N0.  So, 
one  of  the  errors  can  be  found  using  a  single-error  correcting 
BCH  decoder.  From  the  error  bit  position  found  by  the  BCH 
decoder,  we  can  find  the  other  error  bit  positions  by  Table  1. 

Table  1:  Non-zero  bit  positions  generating  the  codewords  of  weight  less 
than  10. 


Distance  (d) 

Bit  positions  in  information  frame 

6 

(134, 137)  (165, 168)  (179, 182) 

(474, 4 77)  (491, 494) 

8 

(9,18)  (16,19)  (94,103) 

(112, 115)  (171, 174)  (214,  217) 

(224,  233)  (243,  252) 

9 

(88,  95,  96)  (130, 131, 132) 

(368,  376, 378)  (453, 460, 464) 

(476, 478,  483) 

HI.  Simulation  and  Discussion 

The  rate  1/2  turbo  code  for  the  simulations  consists  of  two 
(7,  5)8  RSC  codes  linked  by  a  length  512  random  interleaver. 
The  MAP  was  used  for  iterative  decoding  of  turbo  codes. 
BPSK  modulation  and  AWGN  channel  were  also  assumed. 
For  the  outer  code,  a  (31,  21)  double-error  correcting  BCH 
code  and  a  (63,  57)  single-error  correcting  BCH  code  were 
used.  The  proposed  scheme  (1-BCH  turbo)  was  simulated 
using  fewer  parity  bits  than  the  scheme  of  [1]  (2-BCH  turbo). 
Fig.  1  shows  that  the  turbo  code  using  a  BCH  outer  code  is 
superior  to  the  original  turbo  code  (turbo)  at  moderate-to- 
high  Eb/N0.  The  proposed  scheme  shows  a  performance 
improvement  of  0.75  dB  compared  to  the  original  turbo  code 
and  0.15  dB  compared  to  the  scheme  of  [1]  at  BER  lOA  The 
FER  performance  improvement  is  l.OOdB  compared  to  the 
original  turbo  code  and  0.20dB  compared  to  the  scheme  of  [1] 
at  FER  10-4. 

The  proposed  scheme  has  only  a  slight  reduction  of  code 
rate  due  to  protection  of  many  non-zero  bit  positions  by  small 
parity  bits.  Moreover,  the  proposed  scheme  has  an  advantage 
of  protecting  the  non-zero  bit  positions  in  information  frames 
of  weight  greater  than  2.  Due  to  these  two  characteristics,  the 
proposed  scheme  is  superior  to  the  scheme  of  [1]  in 
performance.  Since  single-error  correcting  BCH  codes  have 
very  simple  decoding  structure,  the  complexity  of  decoder 
and  decoding  time  are  reduced. 


Fig.  1:  The  performance  of  the  original  turbo  code  and  the  turbo 
code  with  the  BCH  outer  codes  after  7  iterations. 
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Recall  that  Huffman  code  is  an  iterative  algorithm  built 
over  the  associated  Huffman  tree,  in  which  the  two  nodes  with 
lowest  weights  are  combined  into  a  new  node  with  a  weight 
that  is  the  sum  of  the  weights  of  its  two  children.  Such  a 
construction  is  not  unique  but  fortunately  with  a  simple  mod¬ 
ification  to  the  Huffman  algorithm,  it  is  possible  to  construct 
a  unique  Huffman  code  so  that  the  longest  code  words  are 
as  short  as  possible  (cf.  [6]).  Hereafter,  we  deal  with  such 
modified  Huffman  codes  and  present  a  precise  asymptotic  re¬ 
sults  on  the  average  redundancy  of  such  codes  for  memoryless 
sources. 

Given  a  probabilistic  source  model,  we  let  P(x?)  be  the 
probability  of  the  message  x"  €  An.  For  a  code  Cn ,  we  denote 
by  L(Cn,x")  the  code  length  for  x".  The  average  redundancy 
Rn{Cn,P)  is  defined  as 

Rn(Cn)  =  Ex*[Rn(Cn,P;  *")]  =  E  [L(Cn,X?)\  -  Hn(P) 

where  Hn(P)  is  the  entropy,  and  E  denotes  the  expectation. 

To  the  best  of  our  knowledge,  no  asymptotic  results  have 
been  reported  in  literature  on  the  average  redundancy  of  Huff¬ 
man  codes.  However,  many  elegant,  insightful  and  useful  lower 
and  upper  bounds  on  Rff  are  known.  Gallager  [4]  proved  that 
Rn  <  Pi  +  lg(2(loge)/e)  ~  pi  +  0.086  where  pi  is  the  proba¬ 
bility  of  the  most  likely  symbol.  This  bound  was  further  im¬ 
proved  by  Capocelli  and  de  Santis  [2],  Stubley  [6]  and  others 
(cf.  [1]). 

Let  p  denote  the  probability  of  generating  a  0  and  q  — 
1  —  p  denote  the  probability  of  emitting  a  1.  Throughout, 
we  assume  that  p  <  |.  Certainly,  this  does  not  restrict  the 
generality  of  the  analysis. 

We  start  with  the  average  redundancy  of  the  Shannon- 
Fano  code  of  a  block  x"  of  length  n.  It  assigns  code  length 
[— log2pfc(l  —  p)n~k ]  to  the  block  x"  where  k  is  the  number 
of  “1”  in  Xi .  Thus,  its  average  redundancy  is 

RnF  =  1  -  +  pn) 

k= 0  '  ' 

where  (x)  =  x  —  [x J  being  the  fractional  part  of  x ,  a  =  log 2  ( 1  — 
P)/P  and  0  =  log2  (1  -p) Ip- 

Theorem  1  Consider  the  Shannon-Fano  block  code  of  length 
n  binomially(n,p)  distributed  over  a  binary  alphabet.  Then, 
for  p  <  \  as  n  — >  oo 

+  o(l)  a  irrational 

I-i(<Mn/?)-i)+0(p")  a=£ 

^his  work  was  supported  in  part  by  NSF  Grants  NCR-9415491 
and  C-CR-9804760. 


where  p  <  1  and  gcri(iV,  M)  =  1. 

Now,  we  are  in  position  to  summarize  our  results  for  the 
Huffman  code.  Stubley  [6]  was  led  to  the  following  asymptotic 
formula  for  the  Huffman’s  average  redundancy  for  the  block 
x"  generated  by  a  memoryless  source 

R"  ~  1  +  RiF  -  -2^  (l)pkqn~k 2~(ak+0n) 

k  = o  '  ' 

where  p  <  1. 


Theorem  2  Consider  the  Huffman  block  code  of  length  n  bi- 
nomially(n.p)  distributed  over  a  binary  alphabet.  Then,  for 
p  <  |  as  n  — >  oo 


1  ~  I7g2  +  °(1)  ~  0.057304,  a  irrational 

—  —  —  (tQMn )  -  M _ 1 _ a  =  — 

2  M  \\Plvlnl  2  /  Ar(l-2-'/M)  ’  M 


where  N,  M  are  integers  such  that  gcd(N,M)  —  1. 


Observe  that  if  we  set  in  the  rational  case  x  =  ( MnP ),  then 

max  RZ  =  1  -  1  +  1°S^°g2  =  lg(2(lge)/e)  =  0.08607  . . . , 
0<x<l  log  2 

which  is  the  Gallager  upper  bound  (since  the  most  likely  prob¬ 
ability  pi  =  0(1 /y/n)  in  this  case).  We  formulate  it  as  a 
corollary. 


Corollary  1  The  maximum  value  of  the  average  Huffman  re¬ 
dundancy  is 

ma x{R"  }  -  1  -  l-±l?g*°g2  =  lg(2(lg  e)/e)  =  0.08607 
log  2 

as  n  — 1  oo. 
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I.  Introduction 

Huffman  codes  or  minimum-redundancy  prefix  codes  is  one  of 
the  most  widespread  compression  techniques  nowadays. 

Canonical  codes  are  a  subclass  of  Huffman  codes,  described 
by  Conellfl]  and  Schwartz  and  Kallick[4].  The  canonical  codes 
have  a  numerical  sequence  property,  i.e.  codewords  with  the 
same  length  are  binary  representations  of  consecutive  inte¬ 
gers.  Once  the  length  of  the  current  codeword  is  known,  it 
can  be  decoded  by  several  arithmetic  operations  in  the  follow¬ 
ing  way.  Supposed  we  have  read  the  prefix  b  of  the  current 
codeword  and  all  codewords  with  prefix  6  have  length  l.  In¬ 
dexes  of  the  codewords  with  prefix  b  are  consecutive  integers 
and  the  codewords  themselves  are  binary  representations  of 
consecutive  integers. Let  h  be  the  length  of  the  prefix  6,  firsti 
be  the  index  of  the  first  codeword  with  length  l  and  bn  be  a 
value  of  the  next  l  —  h  bits.  Then  the  index  of  the  current 
codeword  can  be  computed  as  bn  +  firsti .  This  idea  is  used  in 
the  algorithm,  described  in  [3].  Single  bits  are  read  from  the 
input  stream,  until  the  codeword  length  l  can  be  determined, 
then  we  read  another  l  —  h  bits  and  compute  codeword  index 
with  the  above  formula.  A  special  data  structure,  called  an 
sk-tree  is  used  to  check,  whether  a  codeword  length  can  be 
determined  from  the  read  bits. 

II.  Decoding  with  sequential  look-up  tables 

In  this  work  we  describe  a  table  look-up  decoding  method. 
It  leads  to  fast  decoding  without  causing  too  high  memory 
requirements.  Besides  that,  combined  with  a  special  data 
structure,  it  enables  memory-efficient  decoding  without  bit- 
oriented  processing  of  the  input  stream. 

Let  Imin  and  l-max  denote  minimal  and  maximal  codeword 
lengths  respectively,  hmin  and  hmax  will  denote  minimal  and 
maximal  codeword  lengths  for  codewords  with  prefix  6. 

Instead  of  reading  a  fixed  number  of  bits,  we  use  the  al¬ 
ready  read  codeword  prefix  to  determine  a  possible  codeword 
length.  We  do  not  traverse  a  Huffman  tree  bit-by-bit,  but  read 
at  each  stage  as  many  bits  as  possible.  Thus,  we  begin  with 
reading  Imtn  bits. If  the  codeword  length  of  the  current  code¬ 
word  equals  Imin,  than  the  corresponding  symbol  is  output. 
If  the  symbol  length  can  be  identified  from  bits  already  read, 
next  lb  bits  are  read,  otherwise,  next  hmin  —  Imin  bits  are  read. 
The  process  is  repeated  until  a  symbol  is  output.  This  process 
can  be  implemented  with  a  series  of  tables.  Every  table  record 
consists  of  two  fields.  One  field  is  used  to  indicate,  whether 
a  codeword  has  been  read  or  the  next  table  has  to  be  used. 
The  second  field  contains  either  a  symbol,  corresponding  to  a 
codeword  or  a  pointer  to  the  next  table.  We  look  up  the  value 
of  the  first  Imin  bits  in  the  first  table.  If  the  bits  read  so  far 
constitute  a  codeword  we  output  the  corresponding  symbol, 
otherwise  we  read  the  next  bit  sequence. 

The  number  of  records  in  all  tables  does  not  exceed  the 
number  of  nodes  in  the  Huffman  tree,  therefore  2n  —  1  is  an 
upper  bound  for  the  number  of  table  records.  Let  S  be  the 


Procedure  Read_Next_Symbol(  ) 
begin 

while  (table [bitval] .type  <>  DIRECT_DEC0DE) 
next_length : =table [bitval] . length ; 
table : =table [bitval] . next_table ; 
bitval :=get_next_bits(  next_length  ); 
output  table [bitval] . value; 
end 


set  of  codeword  prefixes  b,  such  that  length  of  b  equals  to  the 
length  of  some  codeword  in  the  Huffman  code.  Let  length(b ) 
be  the  length  of  b.  Then  the  total  number  of  records  can  be 
computed  as  '^2bes‘2‘bmin~len9th^  +  2lm,n  +  n 

The  described  algorithm  uses  essentially  less  space  than 
classical  Huffman  tree  approach  and  allows  for  faster  decom¬ 
pression.  Our  algorithm  is  also  faster  then  sk  —  tree  decoding, 
for  we  always  read  groups  of  bits  and  not  individual  bits. 

Further  we  suggest  a  special  finite-automaton-based  data 
structure,  which  allows  reading  of  up  to  i  bits  from  the  in¬ 
put  stream  without  using  bit-oriented  operations.  This  finite 
automaton  has  states  corresponding  to  sill  binary  sequences 
b  with  length  between  1  and  8.  The  input  alphabet  consists 
of  integers  between  1  and  i,  the  output  alphabet  consists  of 
pairs  (v,j),  where  j  is  an  integer  between  1  and  i.  States  of 
the  automata  correspond  to  the  “rest”  of  current  byte,  that 
is  not  yet  processed.  Input  l  indicates  the  number  of  bits  to 
be  read,  v  is  the  value  of  read  bits,  j  is  the  number  of  bits 
that  should  be  read  at  the  next  step.  Supposed  the  FSM  is 
in  a  state  s,  corresponding  to  the  bit  sequence-  of  length  k 
and  input  integer  is  L  If  l  >  k  the  automaton  outputs  pair 
(v,  l  —  k),  where  v  is  the  value  of  b  shifted  l  —  k  bits  left.  If 
l  <  k,  the  automaton  outputs  pair  v,l  —  k,  where  v  is  the  value 
of  b  shifted  k  —  l  bits  right.  Thus  in  the  output  pair  (v,j)  the 
first  component  v  is  the  value  of  “as  many  as  possible”  read 
bits  from  the  current  byte  and  j  indicates  the  number  of  bits 
which  should  be  read  from  the  next  bytes. 

The  algorithm  and  data  structure  described  in  this  work 
allow  fast  decoding  of  Huffman  codes,  that  can  be  efficiently 
implemented  without  using  bit  operations. 
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Abstract  —  We  prove  that  the  maximum  data  ex¬ 
pansion  of  Huffman  coding  is  at  most  0.83485  bits  per 
symbol,  improving  on  the  previous  best  known  bound 
of  1.256  bits  per  symbol.  Our  bound  is  very  close  to 
the  0.8  bits  per  symbol  conjectured  by  Cheng  et  al. 

I.  Problem  Definition 

The  data  expansion  problem  for  Huffman  codes  was  first  pro¬ 
posed  by  Cheng  et  al.  [1]  and  has  been  investigated  in  [l]-[5] . 
Let  N  be  the  size  of  the  source  alphabet  and  assume  a  binary 
alphabet  for  the  codewords.  Denote  the  probability  and  the 
Huffman  codeword  length  of  the  i-th  source  symbol  by  p,  and 
li,  respectively.  The  data  expansion  of  the  code  is  defined  as 
follow,  l=  £MIl_fl0g3N1)  (1) 

{*  I  b>n°g2  N] } 

Data  expansion  is  a  measure  of  the  temporary  increase  of 
the  file  size  in  the  worst  case  if  the  Huffman  codewords 
replace  the  fixed-length  codewords  sequentially  and  “in 
place”  [1].  It  is  also  a  measure  of  the  penalty  for  using  long 
codewords  for  less  likely  codewords  if  we  ignore  the  ben¬ 
efit  we  get  from  using  short  codewords  for  more  likely  symbols. 

The  goal  of  this  paper  is  to  find  a  universal  upper  bound 
on  the  data  expansion,  for  any  number  of  codewords  and  any 
probability  distributions.  The  conjectured  and  the  best  known 
bounds  are  0.8  [l]  and  1.256  [5]  bits  per  symbol  respectively. 

II.  Canonical  Ordered  Huffman  Code  Trees 
and  Sufficient  Sets 

A  canonical  ordered  Huffman  code  tree  is  an  ordered  Huffman 
code  tree  [6]  in  which  the  probability  of  every  intermediate 
node  is  no  less  than  the  probability  of  every  terminal  node  at 
the  same  level. 

Define  S  as  the  space  of  all  possible  combinations  of  Huff¬ 
man  code  tree  structures  and  probability  distributions.  A  sub¬ 
set  S'  of  S  is  called  a  sufficient  set  if  for  any  data  expansion  S 
achieved  by  some  element  of  S,  there  exists  an  element  of  S' 
with  data  expansion  at  least  as  large  as  8. 

III.  Data  Expansion  Upper  Bound 

Theorem  1  The  set  of  Huffman  code  trees  with  the  following 
properties  is  a  sufficient  set  for  the  maximum  data  expansion 
problem  of  Huffman  codes. 

1.  The  Huffman  code  tree  is  canonical  ordered. 

2.  There  is  at  most  one  codeword  at  each  level  up  to  level 
log2  N,  and  the  probability  of  each  such  codeword  is 
equal  to  the  maximal  node  probability  at  the  next  level. 

3.  The  only  codewords  at  levels  greater  than  log2  N  are  at 
the  largest  level  and  the  second  largest  level. 

4 .  Either  all  codewords  at  the  largest  and  second  largest 
levels  have  the  same  probability  (case  A)  or  codewords 
at  the  second  largest  level  have  twice  the  probability  of 
the  codewords  at  the  largest  level  (case  B). 


It  is  sufficient  to  consider  only  N  a  power  of  2,  say  N  = 
2m.  Consider  any  Huffman  code  from  the  sufficient  set  of 
Theorem  1.  Let  N'  be  the  number  of  intermediate  nodes  at 
level  m.  Let  K,  K  <  m,  be  the  number  of  codewords  of 
length  at  most  m.  Denote  the  largest  level  as  L.  Define  f3  = 
( N  —  K)/N' .  It  can  be  shown  that 

L  =  m  +  7+  Iog2  -  (2) 

where  7  is  0  if  the  number  of  codewords  at  level  L  —  1  is  0  and 
1  otherwise.  It  can  then  be  shown  that  the  data  expansion  is 

c  =  f  ([log2j3j+2(l-i-2L1°«^J ))PA,  case  A 

1  (Llog2  P\  - 1  +  ^Tufrgj)  case  B 

where  Pa  and  Pb  represent  the  total  probability  that  con¬ 
tributes  to  the  data  expansion  in  each  case.  We  prove  that 


Pa  <  Pb 

p  =  _ 2  N’(0i-92) 

B  -  el2)  +  (2m~‘  +  N')(0[+1  -  0‘2+1) 

<  _ 2(0!  -  fla) _ 

-  2-lp{0\  -0‘2+  0i+1  -  0‘2+1)  +  (0i+1  -  0‘2+1) 


(4) 

(5) 

(6) 


where  0i  —  (1  +  \/5)/2,  02  =  (1  —  \/5)/2,  and  Z  is  defined  as 
min  (7  |  rij  =  0}  —  1,  where  n-j  is  the  number  of  codewords  at 
level  7.  It  can  be  shown  that 


l  = 


(7) 


Using  (3),  (4),  (6),  (7),  and  the  fact  that  the  maximum  data 
expansion  is  monotonically  increasing  with  m,  we  show  that 
the  maximum  data  expansion  for  case  A  and  B  is  at  most 
0.83485  and  0.8  bits  per  symbol  respectively. 


Theorem  2  The  maximum  data  expansion  of  Huffman  codes 
is  at  most  0.83485  bits  per  symbol. 
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Abstract  —  An  efficient  implementation  of  a  Huff¬ 
man  code  is  based  on  the  Shannon-Fano  construction. 
An  important  question  is:  how  complex  is  such  an 
implementation.  In  the  past  authors  have  considered 
this  question  assuming  an  ordered  source  symbol  al¬ 
phabet.  For  of  the  compression  of  blocks  of  binary 
symbols  this  ordering  must  be  performed  explicitly 
and  it  turns  out  to  be  the  complexity  bottleneck. 

I.  The  Huffman-Shannon-Fano  code 

We  consider  a  binary,  memoryless  source  with 

Pr{Xi  =  1}  =  p  <  f.  The  Huffman-Shannon-Fano  (HSF) 
cddes  [1]  that  we  shall  consider  are  described  as  follows. 

First  assign  to  each  block  xn  a  unique  index  i(xn)  € 
{0,1,...  ,2n  —  1}  such  that  for  all  pairs  of  blocks  xn,yn  € 
{0,l}n  holds  i{xn)  <  i(yn)  =>  P{xn)  >  P{yn).  Let 
w  =  wo,uii,...  be  a  vector  of  code  word  lengths 

such  that:  -  w  satisfies  Kraft’s  inequality  with  equality;  - 
E{W}  =  X^nS{o  i>"  P(xn)wi (xn)  is  minimal;  -  For  all  i,j  e 
{0,1,-..  ,2"-l}  i  <  j  =>  Wi  <  Wj .  So,  w  is  a  non-decreasing 
sequence  given  the  index  ordering  i(xn). 

Now  it  is  time  to  introduce  the  Huffman-Shannon-Fano  en¬ 
coding  procedure  briefly.  Given  the  code  word  lengths  w  we 
determine  the  number  of  codewords  v(i)  of  a  given  length  i. 
We  shall  use  the  notation  w-  for  the  shortest,  and  w+  for  the 
longest  code  word  length. 

From  Nemetz  and  Simon  [3]  we  know  that  for  all  xn  holds 

K(*»)  +  log2  P(a:n)|  =  o{n),  (1) 

and  with  the  fact  that  Pr{0n}  =  — nlog2(l  —  p)  and  Pr{ln}  = 
— n  log2  p  we  obtain  that 

w+  =  -nlog2p  +  o(n),  (2) 

w-  —  —  nlog2(l  —  p)  +  o  (n) .  (3) 

Now  we  can  compute  the  'base’  values  by:  iUJ+}  : 

base(w)  =  J2J=w_  vti) 2“_'?  ~  vti)- 

The  encoding  procedure  is  now  as  follows.  Given  a  source 
sequence  xn  do:  -  Determine  the  index  i  =  i{xn)\  -  Determine 
the  code  word  length  w  =  wp,  Produce  the  code  word  for  xn 
from  the  binary  representation  of  base(w)  +  i  in  w  bits. 

II.  Complexity  considerations 

Storage  complexity:  We  shall  consider  only  the  storage  re¬ 
quirements  for  the  encoding  (and .  decoding)  of  a  block  Xn. 
So,  we  do  not  take  into  account  the  cost  of  the  preprocessing 
(designing  the  code). 

1This  work  was  performed  during  a  visit  at  the  Information 
Technology  Department  of  Lunds  Tekniska  Hogskola,  Sweden. 


Time  complexity:  We  require  that  the  total  time  complex¬ 
ity  is  0  (n).  Again  we  only  consider  the  encoding  and  decoding 
cost  and  not  the  preprocessing  cost. 

Usually,  but  not  always,  we  can  interchange  storage  and 
time  complexity  by  adding  more  units  to  perform  more  oper¬ 
ations  in  parallel  thus  increasing  the  storage  complexity  while 
decreasing  the  time  complexity  and  vice  versa. 

III.  Conclusions 

The  storage  complexity  of  the  HSF  code  is  bounded  by  the  cost 
of  indexing  the  source  sequence.  This  is  a  fact  that  is  ignored 
in  the  Computer-Science  literature  where  one  is  concerned 
with  an  efficient  determination  of  the  codeword  lengths.  How¬ 
ever  that  is  a  one  time  only  problem,  while  for  the  encoding 
and  decoding  one  needs  the  indexing  once  per  codeword. 

Summarizing  the  complexity: 

•  The  cost  of  the  code  word  generation.  When  we  store 
the  base  array  the  time  complexity  is  0  (1)  and  the  stor¬ 
age  cost  is  0  (n2)  bits.  We  also  showed  that  it  is  possible 
to  compute  the  base  values  when  we  need  them  in  0  (n) 
time.  The  storage  cost  then  is  0  (n)  because  we  must 
save  the  resulting  codeword. 

•  The  cost  of  the  code  word  length  array.  We  described  an 
algorithm  that  produces  the  required  code  word  length 
from  the  source  sequence  index  in  0  (log  n )  time  and 
0  (n2)  storage  space. 

•  The  index  computation  is  still  an  open  question.  Using 
enumerative  techniques  similar  to  [2]  we  have  two  op¬ 
tions,  either  we  use  a  table  of  binomial  coefficients,  Pas¬ 
cal’s  triangle,  or  we  compute  the  required  coefficients. 
Pascal’s  triangle  requires  0  (n3)  bits  of  storage  and  the 
computation  of  the  index  then  costs  0  ( n )  time.  Com¬ 
puting  the  coefficients  requires  0  ( n )  divisions  that  must 
be  performed  sequentially  thus  resulting  in  a  time  com¬ 
plexity  of  0  (n2),  which  is  unacceptable. 
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Abstract  —  An  approach  towards  an  information 
theoretic  based  analysis  and  design  of  cache  memo¬ 
ries  is  presented.  Computer  systems  usually  have  a 
slow  main  memory  and  a  faster  cache  memory,  usu¬ 
ally  much  smaller  than  the  main  memory  because  of 
its  cost.  A  somewhat  similar  situation  is  present  also 
in  Internet  servers.  A  miss  happens  whenever  an 
item  is  not  found  in  the  cache  memory.  If  so,  the 
item  is  fetched  from  the  main  memory  and  placed 
in  the  cache.  A  hit  is  obtained  whenever  the  item 
is  found  in  the  cache.  Usually  the  cost  of  a  miss  is 
several  times  that  of  a  hit.  The  goal  is  to  find  strate¬ 
gies  for  the  many  to  one  mapping  of  addresses  of  the 
main  memory  to  the  cache  memory,  as  well  as  the 
replacement  strategies.  Usual  replacement  strategies 
are  Least  Recently  Used,  LRU,  Random  Replacement, 
RR,  etc.  The  main  goal  is  to  obtain  strategies  that 
will  optimize  the  running  time  of  the  program  un¬ 
der  execution.  Since  almost  all  programs  use  branch 
instructions  and  loops,  some  of  the  information  the¬ 
oretic  approaches  previously  introduced  consider  the 
prediction  of  the  result  of  a  branch  based  on  its  past 
behavior.  Here  a  different  approach  is  considered. 
In  particular  the  opposite  case  is  analyzed,  i.  e.  a 
linear  loop  that  is  executed  indefinitely.  In  particu¬ 
lar  a  combined  Random  Replacement  and  Least  Re¬ 
cently  Used  strategy  is  analyzed.  It  is  shown  that 
this  model  is  equivalent  to  the  one  of  classifying  N 
objects  in  M  classes  with  at  least  c  objects  in  each 
class,  and  that  this  problem  gives  a  generalization  of 
the  Ehrenfests’  urn  model  used  in  Statistical  Ther¬ 
modynamics  in  connection  with  the  Boltzmann  H- 
theorem.  In  that  sense,  a  combinatorial  generaliza¬ 
tion  of  the  Stirling  Numbers  of  the  Second  Kind  is 
presented  as  the  number  of  partitions  of  a  set  with  n 
elements  in  m  subsets  with  at  least  c  elements  each. 
Combinatorial  properties  and  a  recursive  relation  are 
obtained.  The  generating  function  is  obtained  as  the 
m-th  power  of  a  truncated  exponential  series  expan¬ 
sion  at  c.  Asymptotic  results  are  given  for  n  going  to 
infinity,  with  m  fixed,  and  with  n/m  constant,  from 
which,  in  particular,  the  Stirling  Formula  is  obtained. 
The  connection  with  some  large  deviation  theory  re¬ 
sults  are  discussed,  as  well  as  the  relation  with  the 

1This  work  was  partially  supported  by  the  Universidad  de 
Buenos  Aires,  grant  No.  TI-09,  and  the  Consejo  Nacional  de  Inves- 
tigaciones  Cientificas  y  Ticnicas,  grant  No.  PIP-4030,  CONICET, 
Argentina. 


minimum  variance  unbiased  estimator  of  Truncated 
at  c  Poisson  Distributions.  The  solution  of  the  lin¬ 
ear  loop  model  is  given  in  terms  of  a  Markov  chain 
which  generalizes  the  Ehrenfests’  urn  model.  Finally, 
it  is  discussed  how  the  results  obtained  so  far  suggest 
an  information  measure  for  the  behavior  of  arbitrary 
programs  and  a  Bayesian  approach  to  cache  memory 
optimization. 
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Abstract  —  A  new  approach  to  investigating  the 
Mastermind  game  and  related  problems,  among  them 
uniquely  decodable  codes  for  noiseless  adder  channel, 
based  on  ideas  and  methods  of  coding  theory  is  pro¬ 
posed.  This  approach  leads  to  improved  bounds  in 
various  problems  associated  with  the  rigidity  of  Ham¬ 
ming  spaces. 

I.  INTRODUCTION 

We  call  a  set  B  a  base  of  a  metric  space  L  if  every  point 
of  L  is  uniquely  determined  by  its  distances  to  the  points  of 
B.  The  minimal  possible  number  of  points  of  a  base  is  called 
the  rigidity  of  the  metric  space  and  denoted  by  r(L).  Let 
Hn,q  be  the  n-dimensional  q- ary  Hamming  space  and  r„,,  be 
its  rigidity.  This  notion  was  introduced  in  1963  by  P.  Erdos 
and  A.  Renyi  [1]  for  solving  the  following  weighing  problem: 
what  is  the  minimal  number  W  (n)  of  weighings  on  an  accurate 
scale  to  determine  all  counterfeit  coins  in  a  set  of  n  coins.  It 
is  easy  to  see  that  the  minimal  number  Wd{n)  of  weighings 
for  deterministic  strategies  differs  from  rn, 2  by  not  more  than 
on  one.  Note  that  this  problem  is  equivalent  to  the  problem 
of  uniquely  decodable  codes  for  noiseless  n-user  adder  channel 
[2]- 

Many  mathematicians  have  worked  on  the  game  of  Mas¬ 
termind.  For  instance,  in  1977  D.  Knuth  proved  [3]  that  4 
questions  suffice  to  determine  a  hidden  “code”  -  i.e.,  a  word  x 
of  length  n  in  an  alphabet  of  q  elements  for  n  —  4,  q  =  6.  De¬ 
note  by  m(n,<?)  the  minimal  number  of  queries  to  determine 
any  “hidden”  word  x,  and  by  m.d(n,q)  the  minimal  number 
of  queries  for  the  case  of  deterministic  strategies.  Obviously, 
q  —  1  queries  is  enough  to  find  the  composition  of  the  word  x. 
Therefore,  rn,q  —  (q—  1)  <  ma(n,  q)  <  rn,,,  and  the  asymptotic 
behavior  of  the  both  values  md{n,q)  and  rn,q  is  the  same  at 
least  for  the  case  of  n  »  q. 

II.  The  rigidity  of  the  Hamming  space:  the 

CASE  q  IS  FIXED 

Obviously,  r„,q  >  [og  (^+i)>  because  the  number  of  possible 
distances  is  not  more  than  n  +  1.  Straightforward  generaliza¬ 
tion  of  [1]  gives  twice  better  bound: 

rn,q  >  2r-7— —(1  +  O(l)).  (1) 

log,” 

By  considering  a  random  base  of  Hn>q  V.Chvatal  [4]  proved 
that  rn,q  <  C'(g)Is^:(l  +  o(l)),  where  C(q)  =  2(2  +  log,  2). 
More  precise  calculations  show  that 

rn,q  <  cWr^-f1  +  o(l)),  (2) 

log,  n 

1This  work  was  supported  by  the  NSF  grant  NCR-9703844 
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where  c(q)  =  21og,(l  +  (q  -  l)<j)  <  4  <  C{q). 

It  was  proved  in  [5], [6]  that  rn, 2  =  2  lo^  —  (1  +  o(l)).  We 
prove  that 

Theorem  1:  For  q  =  3,  4 

r"'*  =  2dL^1+°(1)) 

log,  n 

III.  The  rigidity  of  the  Hamming  space, 
Mastermind  and  “Bull  and  Cows”  games:  the 
case  n  =  q 

Let  us  consider  the  case  n  =  q.  Further,  consider  among  all 
n-letter  words  the  set  Sn  of  words  without  repetitions  of  sym¬ 
bols,  i.e.,  permutations.  This  space  corresponds  to  another 
famous  (and  much  older,  see  [3])  game,  “Bulls  and  Cows” . 
As  in  the  last  section,  random  choice  of  a  base  and  entropy 
techniques  proves  the  following  result. 

Theorem  2: 

0(n  log 2  n)  <  r(Hn,n),r(Sn)  <  4n  log2  n(l  +  o(l)). 

IV.  Conclusion 

We  show  how  information  theory  techniques  can  be  use¬ 
ful  for  investigating  several  long  standing  problems.  However, 
the  centeral  question,  “what  is  the  rigidity  of  q- ary  Hamming 
space?”  remains  open  for  q  >  4.  We  conjecture  that,  as  in 
the  binary  case,  random  choice  does  not  give  the  final  an¬ 
swer  to  the  problems  considered.  We  have  considered  the  case 
of  deterministic  strategies  of  the  games.  Much  less  is  known 
about  adaptive  strategies,  which  corresponds,  in  particular, 
to  noiseless  adder  channel  with  feedback.  Another  interesting 
direction  is  the  relationship  of  these  problems  with  superim¬ 
posed  codes  on  the  n-dimensional  cube  (i.e.,  -space) ,  see 

[7]- 
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Abstract  —  A  transmission  strategy  that  allows  the 
sender  to  deliver  any  of  M  messages  to  the  receiver 
over  a  binary  channel  when  at  most  e  errors  can  occur 
is  presented.  The  total  number  of  bits  required  by 
the  strategy  differs  from  the  known  lower  bound  by 
3e.  This  statement  simultaneously  gives  a  new  upper 
bound  on  the  number  of  questions  in  the  process  of 
searching  with  lies  known  as  the  “Ulam’s  game”. 

I.  Introduction  and  formulation  of  the  result 

Let  [M]  =  {1, . . . ,  M}  denote  the  set  of  messages.  One  of 
them  should  be  transmitted  over  a  binary  channel  where  at 
most  e  errors  0  — t  1  and  1  — >  0  can  occur.  For  all  r  =  1, 2, . . . , 
the  sender  noiselessly  observes  the  received  bit  yT  and  sends 
the  next  bit  xT+i  based  on  the  message  m  E  [M]  and  the  bits 
Xi , . . . ,  xT,  yi , . . . ,  yT  which  were  transmitted  and  received  at 
the  previous  time  instants.  Any  transmission  strategy  is  the 
algorithm  for  computing  the  bits  xi, . . .  ,xn  by  the  sender  and 
for  decoding  the  message  m  by  the  receiver,  where  n  is  the 
total  number  of  transmitted  bits.  The  equivalent  problem  can 
be  formulated  as  searching  for  an  integer  m  €  [M]  by  asking 
questions  when  at  most  e  lies  are  allowed  in  the  answers  :  the 
question  Q  is  a  subset  of  the  set  [MJ;  the  answer  is  either  0  or 
1,  and  the  answers  0  if  m  E  Q  and  1  if  m  £  Q  are  considered 
as  lies. 

Theorem  :  There  exist  searching  strategies  such  that  any 
of  M  integers  can  be  discovered  with  n *  —  n  +  3e  questions 
when  at  most  e  lies  are  allowed  in  the  answers,  where  n  is 
the  minimal  integer  satisfying  the  inequality  MV}n'>  <  2n  and 

II.  Basic  ideas  of  the  proof 

Suppose  that  Mo, . . .  ,Me  are  pairwise  disjoint  subsets  of 
the  set  [ M ]  constructed  by  the  questioner  in  such  a  way 
that  if  m  E  Mj,  then  j  lies  are  allowed  in  all  further  an¬ 
swers,  j  =  0,  ...,e.  Then  the  vector  c  =  (|Ado|,  •  •  • ,  |Ade|) 
can  be  interpreted  as  the  state  of  the  search.  Let  V(c )  de¬ 
note  the  set  consisting  of  integers  5  such  that  |d|  <  c  and 
5  =  c  (mod  2).  For  all  S  =  (50  E  V(co),  ■ . .  ,6e  E  D(ce)), 
let  a(c|d)  =  (a0, .  .  .,ae),  b(c|<$)  =  (b0,-..,be),  where  a,  = 

( Cj  +  Cj+i  +  6j  —  6j- t-i)/2,  bj  =  ( Cj  +  Cj+ 1  —  5j  +  6j+i)/2  and 
j  =  0, . . . ,  e  (we  assume  that  ce+i  =  <5e+i  =  0).  Let  the  vector 
<5  specify  the  question  IJ^=o  where  Qj  is  the  subset  con¬ 
sisting  of  ( Cj  +5j)/2  smallest  elements  of  the  subset  Mj  for  all 
j  =  0, . . . ,  e.  We  will  call  this  procedure  “partitioning  of  the 
subsets  Mo,  ■  ■  ■ ,  Me  in  accordance  with  the  vector  6”.  Let 
Qe+i  =  Ade+ 1  =  0  and  Qj  =  Mj\Qj  for  all  j  =  0, . . .  ,e  +  1. 
If  the  answer  is  1,  the  new  vector  of  sets  has  components 
Mj  =  Qj  fj  Qj+i  and  a(c|d)  is  the  new  state  of  the  search. 


If  the  answer  is  0,  the  new  vector  of  sets  has  components 
Ad''  =  S;n2J+i  and  b(c|<5)  is  the  new  state  of  the  search. 

A  key  point  of  our  considerations  is  the  introduction  of 
a  special  class  of  rooted  regular  binary  trees  whose  nodes 
“contain”  possible  states  of  the  search.  This  construction 
relates  binary  trees  to  coverings  of  the  Hamming  spaces  by 
sets  having  the  cardinalities  coincident  with  the  sizes  of  the 
Hamming  balls  and  leads  to  the  following  statement,  which 
is  essentially  used  in  the  searching  strategy.  Let  Mo, . . . ,  Me 
be  current  pairwise  disjoint  subsets  constructed  by  the  ques¬ 
tioner  and  let  t  be  the  minimal  integer  satisfying  the  inequal¬ 
ity  ^®_0  |  Mj  |  •  <  2f  .  Then  the  searching  problem  un¬ 

der  consideration  cannot  be  solved  using  less  than  t  questions, 
and  a  solution  with  t  questions  exists  only  if  this  problem  can 
be  solved  using  t  questions  when  some  set  Ado  assigned  in 
such  a  way  that  |  Ado  |  =  2(  -  Yl]=o  I  I  '  Vj^>  -Mo  2  Mo’, 
Ado  Pi  Adi  =  . . .  =  Ado  P|  Me  =  0  is  substituted  for  Ado- 

To  prove  the  theorem  we  present  a  specific  algorithm  for 
assigning  the  vector  <5  for  any  vector  c  that  can  be  obtained  by 
the  questioner  and  denote  this  vector  by  tf*(c).  Let  y  denote 
the  index  of  the  last  positive  component  of  the  vector  c.  If 
Ctj  1,  then  dj  .  c\ , . . . ,  ftq— i  c-q— i ;  *5^  :=  1;  $^4-1  :  = 

. . .  :=  <5*  :=  0.  If  cn  >  1,  then 


5j  :=  arg  min 
5€D(Cj) 


for  j  =  e, . . . ,  1.  In  both  cases,  <5d  (,~*)(SJ*. 

The  searching  strategy  is  given  below,  where  the  vector  of 
length  e  +  1  containing  1  in  the  j-th  position  and  0’s  in  all 
other  positions  is  denoted  by  lj,  j  —  0, . . . ,  e. 

1.  co  :=  2n  -  MVe(n);  Ado  :=  .  - .  :=  Ade_i  :=  0;  Ade  := 

m 

2.  c  :=  (c0,  |Ad  1 1, . . . ,  | Ade |).  If  c  E  {lo, . . . ,  le},  then  go 
to  5. 

3.  Construct  the  vector  d*(c)  =  (<5q  , . . .  ,  <5* )  using  the  al¬ 
gorithm  described  above  with  the  value  of  t  determined 
by  the  equation  Y^j=oci^j^  ~  2<-  K  I  I  >  Co,  then 
co  :=  c0  +  Y,)=o  CJ  (  )  )  and  S°  t0  2- 

4.  Partition  the  subsets  Ado,  •  •  ■  ,Me  in  accordance  with 
the  vector  d*(c).  If  the  answer  is  1,  then  co  :=  (co  +  ci  + 
^0  _  <$i)/2  and  M3  :=  Adj ,  j  =  0, . . . ,  e.  If  the  answer 
is  0,  then  c0  :=  (c0  -f-  ci  —  Aq  +  5i*)/2  and  Mj  :=  M'j , 
j  =  0, . . . ,  e.  Go  to  2. 

5.  End  :  the  singleton  Mj,  where  j  is  such  that  Cj  =  1, 
contains  the  defined  integer. 
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Abstract  —  Let  qe(m )  be  the/ smallest  integer  q  sa¬ 
tisfying  Berlekamp’s  bound  Ei=o  (;)  —  27~m  [1].  We 
prove  that  for  any  fixed  e  >  1  and  all  sufficiently  large 
m  there  is  a  binary  searching  strategy  to  guess  a  num¬ 
ber  x  £  {0, .  ■  ■ ,  2m  —  1}  in  spite  of  up  to  e  lies  in  the 
answers,  which  uses  exactly  qe(m)  questions  and  adap¬ 
tiveness  only  once.  The  strategy  goes  through  a  first 
batch  of  m  non-adaptive  questions  asking  for  the  bits 
of  the  binary  expansion  of  x  and  then,  only  depending 
on  the  answers  to  these  questions,  a  second  batch  of 
qe{m)  —  m  non-adaptive  questions. 

I.  Introduction 

We  consider  the  following  scenario:  Two  players,  called  Ques¬ 
tioner  (Q)  and  Responder  (R)  ,  first  agree  on  fixing  an  integer 
m  and  a  search  space  S  =  {0, . . . ,  2m  —  1}.  Then  R  thinks  of 
a  number  x,  6  S  and  Q  must  find  out  x,  by  asking  questions 
to  which  R  can  only  answer  yes  or  no.  It  is  agreed  that  R  the 
Responder  is  allowed  to  lie  at  mqst  e  times.  We  are  interested 
in  the  problem  of  determining  the  minimum  number  of  ques¬ 
tions  Q  has  to  ask  in  order  to  infallibly  guess  the  number  x,. 
This  problem  was  posed  by  Ulam  [7]  and  Renyi  [4],  and  has 
been  intensively  ivestigated  in  the  last  decades  (see  [2]  for  a 
survey). 

In  the  fully  adaptive  case,  i.e.,  when  the  ith  question  is 
asked  knowing  the  answer  to  the  (i  —  l)th  question,  a  remark¬ 
able  result  of  Spencer  [5]  shows  that  qe  (m)  questions  are  nec¬ 
essary  and  sufficient,  up  to  finitely  many  exceptional  m’s.  At 
the  other,  totally  non-adaptive  extreme,  when  all  the  questions 
are  asked  at  the  outset,  before  knowing  any  answer,  a  series 
of  negative  results  culminating  in  the  paper  by  Tietavainen  [6] 
shows  that  searching  strategies  with  exactly  qe  (to)  questions — 
or  equivalently,  perfect  binary  e-errors  correcting  codes  with 
2m  codewords  of  length  qe  (to) — are  sporadic  exceptions  for 
e  <  3,  and  do  not  exist  for  e  >  3,  except  in  trivial  cases. 

Our  main  result,  stated  in  the  abstract,  says  that  for  each 
e,  and  for  all  sufficiently  large  m,  searching  strategies  do  exist 
having  the  least  possible  degree  of  adaptiveness  (just  once ) 
and  using  exactly  qc{m)  questions.  Since  Q  can  adapt  his 
strategy  only  once,  our  paper  yields  e-fault  tolerant  search 
strategies  with  minimum  adaptiveness  and  the  least  possible 
number  of  tests. 

II.  The  Two-Round  Strategy 

By  a  yes-no  question  we  simply  mean  an  arbitrary  subset  T 
of  S.  If  the  answer  to  the  question  T  is  “yes”,  numbers  in 
T  are  said  to  satisfy  the  answer,  while  numbers  in  S\T  fal¬ 
sify  it.  At  any  time  Q’s  state  of  knowledge  is  represented 
by  an  (e  -I-  l)-tuple  a  —  (A0,  A\, . . . ,  Ae)  of  pairwise  dis¬ 
joint  subsets  of  5,  where  A;  is  the  set  of  numbers  falsify¬ 
ing  exactly  i  answers,  i  —  0,1,2, ...  ,e.  The  type  of  er  is  the 

1This  work  was  supported  by  Enea-Grant. 


(e  +  l)-tuple  (|A0|,  |Ai|, . . . ,  |Ae|).  Moreover,  a  is  a  final  state 
iff  |A0  U  Ai  U  •  •  •  U  Ae\  <  1.  For  any  state  cr  =  (A0, ... ,  Ae) 
and  question  T  C  S,  the  two  states  ayes  and  ano  respectively 
resulting  from  a  positive  or  a  negative  answer,  are  given  by 
ayes  =  {Ay0es,. . . ,  Ayes)  and  an0  =  (A%°, . . . ,  A™)  where,  set¬ 
ting  A- 1  =  0,  we  define  Ayes  =  (Ai  Pi  T)  U  (Ai- 1  \  T)  and 
A?°  =  (Ai  \  T)  U  (Ai- 1  fl  T)  for  each  i  —  0, 1, . . .  ,e. 

The  first  batch  of  questions  is  easily  described  as  follows: 
For  each  i  —  1,2,..., to,  let  Di  C  S  denote  the  question 
“Is  the  ith  binary  digit  of  x»  equal  to  1  ?”  Thus  a  number 
y  £  S  belongs  to  Di  iff  the  ith  bit  yi  of  its  binary  expansion 
V  —  Vl  • ' '  Vm  is  equal  to  1 . 

Upon  identifying  1  =  yes  and  0  =  no,  let  bi  £  {0, 1}  be 
the  answer  to  question  Di.  Let  b  =  &i  •  •  •  bm-  Beginning  with 
the  initial  state  a  =  (S,  0, . . . ,  0) ,  the  resulting  state  as  an 
effect  of  the  answers  bi  •  •  -bm,  is  ob  =  (Ao,A\, . . . ,  Ae),  where 
Ai  =  {y  £  S  |  dn(y,b )  =  i),  for  all  i  =  0,1,..., e.  Here 
dn(-,  •)  denotes  the  Hamming  distance.  Thus  ab  has  type 
(i,  to, 

The  second  batch  of  questions.  We  can  prove  that  for  all 
sufficiently  large  to  there  exists  a  second  batch  of  n  =  qe  (to)  — 
to  non-adaptive  questions  allowing  Q  to  infallibly  guess  the 
secret  number.  Here  follows  the  key  lemma. 

Lemma  II. 1  For  any  fixed  e  and  all  sufficiently  large  m  let 
n  =  qe(m)  —  m.  Then  there  exists  a  family  of  codes  T  = 
{Co,  Ci, . . . ,  Ce-i}  together  with  integers  di  >  2(e  —  i)  -f  1 
(i  —  0, 1, . . . ,  e  —  1)  such  that  (i)  Each  Ci  is  an  (n,  (™) ,  di) 
code  [3];  (ii)  A (Ci,Cj)  >  2e  -  (i  +  j )  +  1,  (whenever  0  <  i  < 
j  <  e  —  1),  where  A(Cj,C2)  =  min{cfjr  (x,  y)  \  x  £  C\,y  £  C2}- 

Let  /  be  any  mapping  associating  elements  in  Ai  to  codewords 
of  Ci  (i  =  0, . . . ,  e  —  1)  and  elements  of  Ae  to  n-bit  vectors  of 
{0,  l}n \F.  Let  f(x)j  be  the  jth  bit  of  f(x).  Let  the  set  Tj  C  S 
be  defined  by  Tj  =  {z  £  S  \  f(z)j  =  1},  (j  —  1, . . .  ,n).  This 
makes  the  second  batch  of  questions.  Intuitively,  Tj  asks  “is 
the  jth  bit  of  f(x „)  equal  to  1?” 
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I.  Introduction 

Welch’s  bound  for  a  set  of  K  equal-energy  sequences  of  length 
N  (K  >  N)  is  defined  as  the  lower  bound  on  the  sum  of 
the  squared  correlations  between  all  pairs  of  these  sequences 
[1,  2],  The  sets  of  sequences  that  achieve  the  Welch  bound  are 
desirable  in  many  signal  design  problems  for  multiple  access 
communications  [2],  Here  a  similar  bound  for  the  set  of  K 
equal-energy,  time-limited  signals  is  derived  when  no  specific 
format  of  signal  waveforms  is  assumed  and  when  the  band¬ 
width  of  the  signal  set  is  taken  into  account.  Signal  sets  that 
achieve  the  lower  bound  axe  also  obtained. 

Let  s (f)  =  [si(t), . . . ,  s/c(0]Ti  0  <  t  <  T,  be  a  vector  of 
K  unit-energy  signals.  The  total  squared  correlation  (TSC) 

of  the  signal  set  is  TSC  =  EiLi  E7L1  (/(T s'(t)sj (f)df^  . 
The  average  root-mean-square  (RMS)  bandwidth  b(s(t))  of 
the  signal  set  satisfies  b2(s(t))  =  /^L  /2 1 -S'*  (/)  1 2d/ 

and  the  signal  set  is  said  to  have  an  average  fractional 
out-of-band  energy  (FOBE)  bandwidth  W  at  level  77  if 
e(s(t))  =  £Ef=i  Sm>w  \Sk(f)\2df  <  77,  where  0  <  77  <  1. 

II.  Results 

The  same  approach  as  in  [3]  has  been  used  to  obtain  the  fol¬ 
lowing  results. 

Proposition  1  Given  T,  W  and  K.  If  1  <  (2 WT)2  < 
(K  -I-  1)(2 K  +  l)/6,  then  the  minimum  total  squared  corre¬ 
lation  (MTSC)  of  the  set  of  K  unit-energy  signals  of  duration 
T  and  average  RMS  bandwidth  less  than  or  equal  to  W  is 

MTS C  =  (l  4.  5[(^  +  l)(2N  +  l)-6(2WT)2]2  \ 

N  V  (N  -  1)(N  +  1)(2N  +  1)(8N  +  11) ) 

where  N  is  the  largest  integer  less  than  or  equal  to  K  such  that 
(2 WT)2  >  [(iV  +  l)(2N  —  l)(2iV  +  1)]  /  [5(4iV  +  1)].  The 
MTSC  is  achieved  by  the  signal  set 


If  (2 WT)2  >(K+  1)(2 K  +  l)/6  then  MTSC  =  K  and  the 
set  of  K  orthonormal  signals  achieves  the  MTSC. 

If  (2WT)2  <  1  then  no  signal  set  of  duration  T  and  RMS 
bandwidth  less  than  or  equal  to  W  exists. 

Proposition  2  Given  T,  W,  K  and  0  <  77  <  1.  Let 
{^o(t),rpi{t), . . . ,  xf>K-i(t)}  and  {xo,  Xu  ■  ■  •  ,Xk-i}  be  the  first 
K  time-truncated,  normalized  and  shifted  prolate  spheroidal 
wave  functions  and  their  eigenvalues,  corresponding  to 
c  =  7 :WT  [4).  If  i  Efjo1  Xk<  1  -  1?  <  xo,  then  the  MTSC 
of  the  set  of  K  signals  of  duration  T  and  average  FOBE  band¬ 
width  at  level  77  less  than  or  equal  to  W  is 

MTSC  =  ^ 

where 

N-l  n-  1 

u(Ar)  =  ^E(1-^)2’  = 

k= 0  k= 0 

and  N  is  the  largest  integer  less  than  or  equal  to  K  such  that 
(1  -  Xn-i)(v(N)  -  77)  <  u(N)  -  77 v(N). 

The  MTSC  is  achieved  by  the  signal  set 

s(t)  -  VA1/2  [ipoit),^!^), . . .  ,if)K-i(t)]T 
where  A  =  diag  (Ai, . . . ,  A k) 

K  (j]v(N)  —  u(N))  +  (v(N)  —  77) ( 1  —  Xk-i) 
k  N  v2(N)  -  u(N) 

for  k  =  1, . . . ,  N;  A*  =0  for  k  =  N  +  1, . . . ,  K  and  V  is  any 
K  x  K  orthogonal  matrix  such  that  VAVT  is  a  unit-diagonal 
matrix. 

If  Eit=i 0  Xk  >  1-7?  then  MTSC  =  K  and  the  set  of  K 
orthonormal  signals  achieves  the  MTSC. 

If  1  —  77  >  \o  then  no  signal  set  of  duration  T  and  FOBE 
bandwidth  at  level  77  less  than  or  equal  to  W  exists. 


1  |  -  v? 

u(N )  -  v2(N) 


where  A  =  diag(Aj, . . . ,  Ak) 


Afc  =  ^  (i  +  5[(JV  +  1)(2N  +  1)  -  6(2WT)2]  ■ 

(N  +  l)(2N  +  1)  —  6fe2  \ 

'  (N  -  1)(N  +  1)(2 N  +  1)(8 N  +  11) ) 

for  k  =  1, . . . ,  N;  A*.  =  0  for  k  =  N  +  1, . . . ,  K  and  V  is  any 
K  x  K  orthogonal  matrix  such  that  VAVT  is  a  unit- diagonal 
matrix. 

xThis  work  is  supported  by  the  University  of  Manitoba  Graduate 
Fellowship  (UMGF)  and  by  an  NSERC  Operating  Grant. 
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Abstract  —  This  paper  derives  general  results  on 
the  partial  auto-correlation  function  of  the  optimal 
spreading  sequences  to  minimize  the  average  error 
probability  under  the  Standard  Gaussian  Approxima¬ 
tion  (SGA)  and  also  provides  a  real-valued  spreading 
sequence  implementation  which  is  at  the  same  time 
optimal  and  practical. 

I.  Introduction 

System  performance  of  asynchronous  CDMA  communications 
with  single-user  matched  filter  reception  critically  depends  on 
the  auto-correlation  and  cross-correlation  of  the  spreading  se¬ 
quences.  This  paper  derives  general  results  on  the  partial 
auto-correlation  function  of  the  optimal  spreading  sequences 
to  minimize  the  average  error  probability  under  the  SGA  con¬ 
dition  without  the  assumption  of  “random  processes”  on  the 
spreading  sequences  [1],  Based  on  the  ergodic  theory  of  dy¬ 
namical  systems  we  can  design  a  family  of  optimal  chaotic 
spreading  sequences  and  evaluate  their  performance  analyt¬ 
ically  if  the  invariant  measure  of  the  dynamical  system  is 
known.  We  describe  a  simple  method  to  implement  spreading 
sequences  with  optimum  auto-correlation  properties  by  using 
Chebyshev  polynomials,  which  are  exact  (and  hence  ergodic) 
transformations,  and  admit  a  closed-form  invariant  measure. 

II.  Derivation  of  Optimal  Sequences 

An  asynchronous  CDMA  system  with  K  users  and 
spreading  factor  N  is  considered.  By  using  a  single- 
user  matched-filter  detector,  the  MAI  power  for  the  i-th 
user  from  all  other  users  can  be  computed  [2]  as  o\  — 
677  £?*  £,=-^[2^(0  +  Ck,i(l)CkAl  +  1)],  where  CW(I) 
is  the  partial  cross-correlation  between  the  fc-th  and  i-th  se¬ 
quence.  Using  the  identities  ]£^7-tv  +  n)  — 

ZiL~i-n  Cx(l)Cy{l  +  n)  and  Ck(l)  =  Ck(-l)  given  in  [2],  the 
MAI  power  can  be  simplified  to  cr,2  =  ^  £^i[2Cfc(0)Ci(0)  + 

4  ZLV  Ck(l)Ci(l)  +  £,N=o1  Ck(l)Ci{l  +  1)  +  'ck{l  +  1  )Ci{l)\. 

With  the  normalization  Ci  (0)  =  1,  the  solution  that  minimizes 
MAI  power  is  given  by  Ck{l)  =  (-1  -r"~t)/(r~N -rs) 

where  r  =  2— \/3,  and  the  corresponding  minimum  MAI  power 
is  <j2ovt  =  y/Z (K  -  1  )(r~2N  -  r2N)/6N(r~2N  +  r2N  -  2).  Note 
that  when  l  <g;  N,  Ck(l)  «  (— r)1  which  decays  exponentially 
with  alternative  sign.  Moreover,  the  minimum  MAI  power  is 
given  by  cr2pt  =  y/3 (K  —  l)/6 N  as  N  is  large,  which  increases 
by  15%  the  number  of  users  achieved  with  white  sequences, 
i.e.,  (K  —  1)/3AT. 

III.  Sequences  Design  Using  Ergodic  Theory 

For  spreading  sequences  generated  by  ergodic  determin¬ 
istic  dynamical  systems,  the  performance  can  be  computed 

1This  work  is  partially  supported  by  MURI-ARO  grant  DAA 
G55-98-0269  and  NASA/Dryden  grant  NCC2-374. 


analytically  by  using  the  Birkhoff  individual  ergodic  the¬ 
ory.  The  n-th  (n  >  2)  degree  Chebyshev  polynomials  de¬ 
fined  by  Tn( x)  =  cos(n  arccos  (x))  with  invariant  measure 
p{x)dx  =  dx/iw/X  —  x 2  is  considered.  The  auto-correlation 
functions  for  these  Chebyshev  sequences  can  be  computed  us¬ 
ing  ergodic  theory  and  is  given  by  <  Cx(l)  >=  y(5(Z)-  There¬ 
fore,  the  system  performance  of  an  asynchronous  CDMA  sys¬ 
tem  using  Chebyshev  sequences  is  identical  to  the  random 
white  sequences  with  the  MAI  power  cr2  =  (K  —  l)/3 N. 

Since  the  auto-correlation  function  of  a  Chebyshev  se¬ 
quence  is  a  Kronecker  delta  function,  we  can  design  the  op¬ 
timal  spreading  sequences  by  passing  these  Chebyshev  se¬ 
quences  through  a  low-pass  filter  with  a  single  pole  at  (— r). 
Then  each  non-overlapped  section  of  the  output  sequences  is 
assigned  to  a  different  user.  The  system  performance  compar¬ 
ison  using  various  spreading  sequences  are  shown  in  Figure  1. 
These  simulation  results  show  that  the  optimal  sequences  are 
better  than  random  white  sequences  by  about  15%  in  terms 
of  allowable  number  of  users,  which  is  consistent  with  the  an¬ 
alytical  expression. 


Figure  1:  Asynchronous  CDMA  performance  comparison 
using  various  optimal  and  white  sequences. 
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Abstract  —  An  algorithm  is  described  for  demodu¬ 
lating  full-surface  two-dimensional  data,  such  as  two- 
dimensional  on-off  keying,  in  the  presence  of  two- 
dimensional  intersymbol  interference,  a  topic  that  is 
becoming  important  in  the  field  of  optical  recording. 

I.  Introduction 

Page-oriented  storage  systems  are  full-surface  recording 
systems  that  use  two-dimensional  waveforms  to  record  the 
user  data.  In  contrast  to  those  older  recording  methods  that 
define  one-dimensional  tracks  on  a  two-dimensional  surface, 
the  recording  waveforms  in  a  full-surface  recording  system  are 
truly  two-dimensional.  Data  is  densely  packed  in  both  the 
horizontal  and  the  vertical  directions  and,  in  a  high-density 
recording  system,  intersymbol  interference  will  be  present  in 
both  the  horizontal  and  the  vertical  directions  because  of  the 
need  to  pack  data  closely  in  comparison  with  the  resolution 
of  the  read  transducer,  whether  that  transducer  be  a  mag¬ 
netic  read  head  or  an  optical  lens  system.  In  a  high  density 
waveform,  the  demodulator  must  be  able  to  recover  the  stored 
user  data  in  the  presence  of  two-dimensional  intersymbol  in¬ 
terference,  as  well  as  additive  noise  and  storage  media  defects. 
A  two-dimensional  waveform  can  be  used  to  obtain  a  desired 
storage  density  only  if  each  pattern  is  uniquely  recognizable 
by  a  computationally  tractable  algorithm  so  that  the  correct 
user  data  can  be  recovered  by  the  demodulator  even  in  the 
presence  of  the  storage  impairments  mentioned  above. 

A  two-dimensional  sequence  estimation  algorithm  may  be 
regarded  as  a  generalization  of  the  Viterbi  algorithm  to  a 
two-dimensional  trellis,  which  is  a  trellis  standing  on  Z2  with 
branches  at  each  trellis  site  connecting  every  node  at  that  site 
to  one  of  the  nodes  at  each  of  the  four  nearest  neighboring 
sites.  The  algorithm  finds,  at  minimum-euclidean  distance 
from  a  given  data  array  of  numbers,  a  set  of  nodes  comprised 
of  one  node  at  each  site  such  that  nodes  at  neighboring  sites 
are  connected  by  branches.  When  used  to  demodulate  a  two- 
dimensional  senseword  with  two-dimensional  intersymbol  in¬ 
terference  in  white  gaussian  noise,  the  performance  is  nearly 
the  performance  of  a  two-dimensional  maximum-likelihood  de¬ 
modulator  provided  that  Eb/No  is  above  some  critical  value. 

II.  The  Laybourn  Algorithm 

Two-dimensional  intersymbol  interference  is  a  straightfor¬ 
ward  generalization  of  one-dimensional  intersymbol  interfer¬ 
ence  to  two  dimensions.  Algorithms  for  processing  intersym¬ 
bol  interference  and  recovering  data,  however,  do  not  gen¬ 
eralize  so  easily.  We  are  interested  in  algorithms  for  mini¬ 
mum  euclidean-distance  demodulation,  which  for  white  gaus¬ 
sian  noise  is  equivalent  to  maximum-likelihood  demodulation. 

The  generalization  of  a  trellis  to  a  two-dimensional  struc¬ 
ture  is  straightforward  in  principle,  but  it  is  not  entirely 
straightforward  to  formalize  this  generalization  or  to  por¬ 
tray  the  trellis  structure  in  a  useful  form.  Consider  the  two- 
dimensional  integer  lattice  Z2.  The  sites  of  the  trellis  are  the 


lattice  points  of  Z2.  Standing  on  each  point  of  the  lattice  Z2 
is  an  identical  column  of  nodes.  Each  node  represents  a  state. 
At  each  trellis  site  branches  every  node  at  that  site  is  con¬ 
nected  to  one  or  more  nodes  at  each  of  the  four  neighboring 
sites. 

The  Viterbi  algorithm,  which  is  a  systematic  method  of 
finding  a  preferred  path  in  a  one-dimensional  trellis,  does  not 
generalize  directly  to  two  dimensions.  More  generally,  the 
dynamic  programming  principal  does  not  directly  apply  to 
the  problem  of  searching  a  two-dimensional  trellis. 

The  notions  of  past  and  future ,  which  are  natural  in  one 
dimension,  do  not  have  immediate  counterparts  in  two  di¬ 
mensions.  Instead,  more  advanced  notions  such  as  neighbor, 
region ,  inside ,  and  outside  must  be  introduced.  Although  such 
notions  themselves  are  not  very  difficult,  they  are  more  dif¬ 
ficult  than  the  one-dimensional  notions  of  past  and  future. 
In  particular,  all  of  the  familiar  techniques  surrounding  the 
Viterbi  algorithm  become  much  more  difficult  when  gener¬ 
alized  to  this  two-dimensional  setting.  For  one  thing,  the 
boundary  of  a  neighborhood  is  variable  in  size,  which  is  quite 
different  from  the  one-dimensional  case.  To  construct  a  sys¬ 
tolic  two-dimensional  trellis-search  algorithm,  some  approxi¬ 
mations  are  inevitable. 

In  this  paper,  we  shall  describe  and  evaluate  a  recursive 
formulation  of  a  demodulator  for  two-dimensional  sensewords 
with  intersymbol  interference.  Because  the  first  author  for 
some  time  has  been  referring  to  the  algorithm  as  the  Laybourn 
algorithm,  we  shall  continue  to  use  this  terminology,  though 
with  some  breach  of  modesty  for  the  second  author. 

However,  if  the  signal-to-noise  ratio  (as  measured  by 
Eb/No)  is  above  a  critical  value,  which  we  may  call  the  critical 
temperature,  then  there  seems  to  be  a  kind  of  phase  transition 
in  the  nature  of  the  two-dimensional  problem.  The  structure 
of  the  problem  freezes  and,  with  high  probability,  only  local 
decisions  are  needed  to  find  the  proximate  codeword.  Con¬ 
versely,  the  structure  of  the  problem  thaws  at  low  Eb/No,  and 
an  algorithm  that  uses  only  local  decisions  will  fail. 
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Abstract  —  We  show  that  block  coded  sequences  are  cyclo- 
ergodic,  and  based  on  this  property,  we  introduce  a  new  non- 
probabilistic  formula  to  calculate  the  average  power  spectral 
density  of  these  sequences.  We  present  a  new  sufficient  condi¬ 
tion  to  construct  codes  with  an  arbitrarily  high-order  spectral- 
null  at  zero  frequency.  Given  this  condition,  we  outline  two  new 
coding  schemes  and  use  them  to  generate  new  classes  of  efficient 
high-order  spectral-null  sequences. 

I.  Introduction 

First-order  spectral-null  codes  have  received  considerable  attention 
in  the  literature.  Recently,  high-order  spectral-null  sequences  have 
also  attracted  interest  [1].  In  high-order  spectral-null  codes,  the 
power  spectrum  of  the  encoded  sequence  and  its  higher-order  de¬ 
rivatives  are  zero  at  zero  frequency  to  achieve  a  wide  spectral  notch 
at  low-frequency.  We  denote  sequences  with  an  Mth-order  spectral 
null  at  zero  frequency  to  be  dcM-codes,  individual  words  to  be  dcM- 
words,  and  denote  these  sequences  and  words  to  be  from  sets 
0(N,M )  and  cp(N,M )  respectively,  where  N  is  the  codeword 
length.  Let  the  source  word  length  be  /.  Such  block  codes  are  called 
UN  codes. 

Construction  of  high-order  spectral-null  sequences  is  usually 
based  on  concatenation  of  block  codewords  with  the  same  order 
spectral-null  [1].  We  denote  Q?(N,M  I  [<p(JV,M)])  to  be  a  subset  of 
<£(A,M)  for  this  coding  scheme  (Mth-order  zero-disparity).  In  this 
approach,  as  the  order  of  spectral-null  increases,  the  code  rate  be¬ 
comes  very  low  and  makes  coding  impractical.  One  proposed  ap¬ 
proach  [2]  to  increase  the  cardinality  of  useable  codewords  of  those 
codes  is  to  employ  codewords  with  fixed  moments.  We  introduce  a 
new  class  of  efficient  dcM-codes  composed  of  block  dcM1-words. 
We  denote  this  subset  of  dcM-codes  as  <b(Y,M  I  [cp(N,M  - 1)]) . 
Comparison  of  cardinality  of  these  schemes  is  shown  in  Fig.  1. 


II.  Block  Encoded  Sequences  With  High-Order 
Spectral-Null 

Assuming  that  the  input  source  sequence  of  a  line  encoder  is 
composed  of  independent  identically  distributed  (i.i.d.)  symbols,  the 
state  sequence  is  a  stationary  Markov  chain.  We  show  that  if  the 
Markov  chain  has  finite  number  of  states  and  is  irreducible  and 
aperiodic  (i.e.  ergodic),  the  output  codeword  sequence  is  both  wide- 
sense  stationary  and  wide-sense  ergodic,  and  the  output  symbol 
sequence  X ( n )  =  {xn}  is  both  wide-sense  cyclostationary  and  wide- 
sense  cycloergodic  with  the  period  of  the  word  length.  We  show 
that  the  statistical  mean  and  autocorrelation  of  a  cycloergodic  cy¬ 
clostationary  stochastic  process  evaluated  by  averaging  over  en¬ 
semble  statistics  are  identical  to  the  asymptotic  time  average  and 
time  average  autocorrelation  of  one  sample  function  of  the  process. 

The  limiting  time-average  autocorrelation  and  limiting  power 
spectrum  are  a  Fourier  transform  pair  [3,  p.  74].  Therefore,  the  av¬ 
erage  power  spectrum  Hx(co)  equals  the  limiting  time-averaged 


(smoothed)  periodogram  given  in  [3,  p.  8 1]: 


Hx  (a)  =  lim  lim - V  -  V  r  e  .  (1) 

‘-*-1 ^~2K  +  \£k  2L  +  1  Jt-L 
By  taking  derivatives  of  (1),  we  conclude  that  if  the  first  M-l 
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moments  of  every  codeword  are  zero  and  the  M-lth  moment  of  the 
sequence  is  bounded,  then  the  sequence  concatenated  by  these 
codewords  has  an  Mth-order  spectral-null  at  zero  frequency. 

III.  Constructions  of  DC^-Codes 
We  present  explicit  constructions  to  implement  spectral-null 
codes  of  order  M  using  (A)  codewords  from  the  set  of 
G>(N,M  \[<p(N,M  -1)])  and  (B)  codewords  from 
®(JV,M  I  [q>(N,M  -l),p(A,M)]) .  We  propose  two  methods  of 
doing  so:  with  coding  tables  and  through  guided  scrambling  (GS) 
[4].  Define  two  codewords  to  be  a  codeword  pair  if  they  have  the 
same  zero  moments  and  if  they  have  the  same  absolute  values  and 
opposite  sign  for  nonzero  moments.  In  approach  (A),  a  source  word 
is  mapped  to  a  set  of  codewords  where  there  exists  at  least  one  pair 
of  codewords  from  <p(N,M  - 1) .  Encode  the  source  words  to  sat¬ 
isfy  the  requirement  for  a  bounded  M-lth  moment  of  the  coded 
sequence.  In  approach  (B),  the  only  difference  from  (A)  is  that  a 
source  word  is  assigned  to  a  unique  codeword  from  <p(N,M)  or  at 
least  one  pair  of  codewords  from  <p(N,M  - 1) .  This  improves  code 
efficiency  and  performance.  Fig.  2  presents  power  spectrum  results. 
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Codeword  length  N 

Fig.  1:  Comparison  of  cardinality. 


Normalized  frequency  fTt 

Fig.  2:  Power  spectrum  of  dc-,  dc2-,  and  dc3-codes  of  zero-disparity 
scheme  (solid  lines)  and  new  coding  schemes  (dash  lines). 
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Abstract  —  We  establish  the  Guassianity  of  the 
output  multiple-access  interference  (MAI)  of  linear 
MMSE  receiver  in  a  large  DS-CDMA  network. 

I.  INTRODUCTION 

Bit  error  probability  (BEP)  is  an  important  performance  mea¬ 
sure  in  wireless  communications,  and  is  determined  by  the 
overall  interference  consisting  of  the  MAI  and  background 
noise.  In  this  paper,  we  study  the  behavior  of  MAI  at  the 
output  of  the  minimum-mean-square-error  (MMSE)  receiver 
employed  in  a  DS-CDMA  system.  We  focus  on  systems  with 
random  spreading.  The  random  signature  model  is  applicable 
to  many  scenarios,  for  example,  systems  employing  very  long 
pseudo-random  spreading  sequences,  and  systems  in  which  the 
signatures  of  the  users  are  repeated  from  symbol  to  symbol, 
but  they  are  randomly  and  independently  selected  initially. 
By  exploiting  results  in  martingale  limit  theory  and  random 
matrix  theory,  we  show  that  as  the  processing  gain  increases, 
(1)  the  output  MAI  of  the  MMSE  receiver  is  asymptotically 
Gaussian;  and  (2)  for  almost  every  realization  of  the  signa¬ 
tures  and  received  powers,  the  conditioned  distribution  of  the 
output  MAI  converges  to  the  same  Gaussian  distribution  as 
in  the  unconditional  case.  These  results  are  quite  general  and 
are  useful  for  performance  analysis  such  as  the  calculation  of 
the  bit  error  probability.  We  note  that  Verdu  and  Shamed  [2] 
obtained  that  for  almost  every  choice  of  signatures,  the  out¬ 
put  MAI  of  the  conventional  receiver  converges  to  a  Gaussian 
random  variable,  while  Poor  and  Verdu  [1]  established  the 
Gaussian  nature  of  the  MAI-plus-noise  at  the  output  of  the 
MMSE  receiver  in  severed  asymptotic  scenarios  (the  output 
MAI  vanishes  in  these  scenarios). 

II.  MAIN  RESULTS 

Consider  the  following  discrete-time  model  for  the  uplink  of  a 
synchronous  CDMA  system.  The  baseband  received  signal  at 
the  front  end  of  the  receiver  is 

Y{N)  =  J^VP;b,s,  +  V, 

i=  1 

where  the  6;’s  are  the  information  symbols,  the  P,’s  are  the 
received  powers,  and  V  is  background  noise  that  comes  from 
the  sampling  of  the  ambient  white  Gaussian  noise.  We  as¬ 
sume  that  users  choose  their  signatures  randomly  and  in¬ 
dependently.  We  restrict  our  attention  to  binary  signa¬ 
tures.  The  model  for  binary  random  signatures  is  as  fol¬ 
lows:  Si  =  -^(sii , . . . ,  SiN)T,  where  the  s;n’s  are  i.i.d.  with 
=  1}  =  P{sin  =  -1}  =  i,  n  —  1  and 
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t  =  1 K .  Moreover,  in  a  practical  wireless  system,  fad¬ 
ing  is  ubiquitous,  making  perfect  power  control  impossible. 
Therefore,  we  assume  that  the  received  powers  are  random. 

We  consider  user  1  without  loss  of  generality.  The  MMSE 
receiver  generates  an  output  of  the  form  of  c‘Yi-n^  ,  where  c  is 
chosen  to  minimize  the  mean  square  error 

J  =  E[(c,YW  -brflP^S], 

where  S  is  the  collection  of  signatures  of  all  the  users.  After 
some  algebra,  we  can  get  the  expressions  for  the  linear  MMSE 
receiver  and  hence  the  output  MAI. 

Our  results  are  asymptotic  in  nature,  with  both  K  find  N 
going  to  infinity,  while  keeping  K/N  fixed.  First  we  impose 
the  following  assumptions  on  the  received  powers: 

(3.B1)  The  empirical  distribution  function  of  {pi 
converges  weakly  to  a  distribution  function  FM; 

(3.B2)  The  second  moments  of  the  received  powers  are 
bounded. 

Theorem  2.1  (Unconditional  MAI)  Suppose  Conditions 
S.Bl  and  S.B2  hold.  Then  the  output  MAI  of  the  MMSE  re¬ 
ceiver  has  a  limiting  Gaussian  distribution  (as  N  -4  oo). 

To  establish  the  asymptotic  Guassianity  of  the  output  MAI 
conditioned  on  the  signatures  and  received  powers,  we  need 
a  stronger  form  of  regularity  on  the  received  powers.  The 
assumptions  we  impose  on  the  received  powers  are  as  follows: 

(3. Cl)  The  joint  empirical  distribution  function  of 
{(Pi,Hi),...,(Pk-,pk)}  converges  weakly  to  a 
distribution  function  Fpifl  with  probability  one; 

(3.C2)  The  Pi’s  are  uniformly  bounded  above; 

(3.C3)  The  p,’s  are  bounded  below  by  a  positive  number. 

Theorem  2.2  (Conditional  MAI)  Suppose  Conditions 
(3.  Cl-3.CS)  hold.  Then  the  conditional  distribution  of  the 
output  MAI  of  the  MMSE  receiver,  given  the  signatures  and 
the  receiver  powers,  converges  almost  surely  (as  N  -¥  oo)  to 
the  same  Gaussian  distribution  as  in  the  unconditional  case. 

Based  on  the  above  theorems,  we  conclude  that  the  overall 
interference  is  asymptotically  Gaussian,  and  that  from  the 
viewpoints  of  detection  and  channel  capacity,  the  signal-to- 
interference  ratio  (SIR)  is  the  key  parameter  that  governs  the 
system  performance. 
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Abstract  —  We  present  a  model  for  the  inter¬ 
ference  generated  by  a  collection  of  geographically- 
distributed,  bursty  transmitters.  Transmitters  begin 
transmission  at  random  times  and  locations,  deter¬ 
mined  by  a  Poisson  process  in  space  and  time.  As 
each  transmitted  signal  propagates  to  the  receiver,  it 
is  attenuated  by  a  power-law  path  loss  and  shifted  in 
phase  by  a  random  angle.  We  show  that  the  com¬ 
bined  interference  of  all  transmitters  can  be  repre¬ 
sented  as  a  moving  average  of  Levy  motions  —  impul¬ 
sive,  a-stable  random  processes  which  are  analogous 
to  filtered  white  noise.  Further,  these  results  can  be 
adapted  to  include  the  effects  of  fading,  delay  spread, 
Doppler,  and  different  modulation  schemes.  The  tools 
developed  here  may  be  useful  for  modeling  other  im¬ 
pulsive  phenomena,  such  as  automobile  ignition  noise, 
atmospheric  noise,  and  radar  clutter. 

I.  Summary 

In  code-division  multiple  access  (CDMA),  transmitters 
share  a  common  frequency  band  and  are  distinguished  at  the 
receiver  on  the  basis  of  unique  signature  sequences.  It  often 
happens  that  these  sequences  are  not  orthogonal,  so  that  a 
conventional  correlation  receiver  passes  not  only  the  desired 
signal,  but  also  co-channel  interference  due  to  the  other  active 
transmitters.  When  the  number  of  interfering  transmitters  is 
small,  this  multiuser  interference  can  be  eliminated  in  princi¬ 
ple  with  multiuser  detection.  For  a  large  population  of  trans¬ 
mitters,  however,  this  generally  is  not  possible.  Under  these 
circumstances,  multiuser  interference  is  often  the  main  factor 
limiting  the  performance  of  CDMA  networks. 

As  with  any  type  of  interference,  accurate  and  tractable 
models  are  essential  in  communication  system  design.  It  is 
not  surprising  then  that  there  is  an  extensive  literature  on 
modeling  multiuser  interference.  Most  of  this  work  deals 
with  interference  produced  by  a  fixed  population  of  equal  (or 
nearly  equal)  power  transmitters,  and  suggests  that  Gaussian- 
mixture  models  are  often  suitable.  These  assumptions  are  of¬ 
ten  appropriate  for  the  reverse  link  in  cellular  CDMA,  where 
power  control  prevents  any  one  mobile  from  dominating. 

A  different  perspective  on  multiuser  interference  is  offered 
by  Sousa  and  Silvester  in  [2],  and  in  subsequent  extensions 
and  applications  [3,  1].  In  these  papers,  transmitters  are  dis¬ 
tributed  at  random  locations  determined  by  a  spatial  Poisson 
process.  There  is  no  power  control  and  so,  due  to.  different 
path  losses,  the  signals  of  different  transmitters  arrive  at  the 
receiver  with  vastly  different  powers.  These  assumptions  are 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  grant  CCR-9903107,  and  by  the  Center  for  Advanced 
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appropriate  for  ad-hoc  packet  radio  networks,  where  there  is 
no  central  base  station.  Under  these  conditions,  Sousa  and  Sil¬ 
vester  [2]  obtain  non-Gaussian  stable  probability  models  for 
multiuser  interference.  Non-Gaussian  stable  probability  den¬ 
sities  are  often  regarded  as  models  of  impulsive  phenomena, 
since  their  tails  are  heavier  than  those  of  the  Gaussian  density. 

The  results  in  [2,  3,  1]  present  a  static,  discrete-time  model 
of  multiuser  interference.  They  essentially  characterize  the 
distribution  of  a  single  interference  sample,  after  the  process 
has  been  filtered  and  sampled.  It  is  natural,  however,  to  ask 
about  the  properties  of  the  continuous-time  interference  that 
arrives  at  the  receiver  front-end.  In  fact,  many  fundamen¬ 
tal  communications  questions  are  more  naturally  addressed 
in  the  context  of  continuous-time  interference.  For  example, 
does  linear  filtering  and  sampling  extract  sufficient  statistics 
for  signal  detection  from  the  received  signal?  How  does  the 
interference  evolve  with  time,  and  how  is  this  time  dependence 
affected  by  the  modulation  scheme  and  bandwidth  of  the  in¬ 
terfering  transmitters?  How  is  it  affected  by  vehicular  motion? 
What  is  the  dependence  between  interference  at  different  fre¬ 
quencies,  and  how  is  it  affected  by  channel  delay  spread?  All 
of  these  questions  suggest  the  importance  of  examining  the 
continuous-time  behavior  of  multiuser  interference. 

In  this  paper,  we  present  a  dynamic  model  for  the 
continuous-time  interference  produced  by  a  collection  of 
geographically-distributed,  bursty  transmitters.  Transmitters 
begin  transmission  at  random  times  and  locations,  determined 
by  a  Poisson  process  in  space  and  time.  As  each  transmit¬ 
ted  signal  propagates  to  the  receiver,  it  is  attenuated  by  a 
power-law  path  loss  and  shifted  in  phase  by  a  random  angle. 
We  show  that  the  combined  interference  of  all  transmitters 
can  be  represented  as  a  moving  average  of  Levy  motions  — 
impulsive,  a-stable  random  processes  which  are  analogous  to 
filtered  white  noise.  Further,  these  results  can  be  adapted  to 
include  the  effects  of  fading,  delay  spread,  Doppler,  and  dif¬ 
ferent  modulation  schemes.  The  tools  developed  here  may  be 
useful  for  modeling  other  impulsive  phenomena,  such  as  auto¬ 
mobile  ignition  noise,  atmospheric  noise,  and  radar  clutter. 
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Abstract  —  This  paper  deals  with  the  asymp¬ 
totic  performance  analysis  of  M-estimator-based  mul¬ 
tiuser  detectors  designed  for  noncoherent  detection  of 
DPSK  signals  in  fading  non-Gaussian  channels. 

I.  Introduction 

Multiple-access  channels  are  inherently  non-Gaussian  in 
nature  due  to  the  presence  in  the  channel  of  highly  structured 
multiple-access  interference.  Moreover,  for  many  multiple- 
access  channels  the  ambient  noise  is  known  through  exper¬ 
imental  measurements  to  be  decidedly  non-Gaussian.  The 
problem  of  joint  mitigation  of  multiple-access  interference 
(MAI)  and  non-Gaussian  interference  is  a  challenging  one, 
since  the  reduction  of  MAI  often  relies  on  linear  separat¬ 
ing  structures  while  the  mitigation  of  impulsive  noise  typ¬ 
ically  relies  on  nonlinear  detectors.  Nevertheless,  consider¬ 
able  progress  has  been  made  on  this  problem  (see  [2]  for  the 
additive-noise  channel  model  and  [1]  for  channels  that  exhibit 
fading).  In  this  paper  the  asymptotic  probability  of  error  of 
the  M-estimator-based  multiuser  detectors  proposed  in  [1]  for 
(differential)  noncoherent  detection  of  DPSK  signals  in  fading 
non-Gaussian  channels  is  derived  under  the  assumption  that 
the  channel  fading  is  Rayleigh  distributed. 

To  introduce  the  M-estimator-based  multiuser  detector  let 
us  consider  a  synchronous  CDMA  system  where  the  signal  of 
each  of  K  active  users  arrives  at  the  receiver  through  an  inde¬ 
pendent,  single-path,  slowly-fading  channel.  At  the  receiver, 
after  complex  basebanding,  chip  matched  filtering,  and  chip 
rate  sampling,  the  resulting  discrete-time  signal  corresponding 
to  the  i— th  signaling  interval  is  given  by 

K 

rn(i)  =  y 'gk(i)bk(i)an  +wn(t)  n=  (1) 

fc=i 

where  N  is  the  processing  gain,  a* ,  a£ , . . . ,  a%  is  the  normal¬ 
ized  signature  sequence  of  the  k— th  user,  gk(i)  is  the  k— th 
channel  fading  coefficient  and  the  sequence  of  noise  samples 
{wn{i)}  is  assumed  to  be  a  sequence  of  independent  and  iden¬ 
tically  distributed  complex  random  variables  whose  in-phase 
and  quadrature  components  are  independent  non-Gaussian 
random  variables  with  a  common  probability  density  func¬ 
tion  /.  The  synchronous  signal  model  (1)  can  be  written  in 
matrix  notation  as 


r(i )  =  H_  9(i)  +  w(i)  (2) 

where  the  real  vectors  r(i),  w(i)  and  8(i)  are  obtained  by 
stacking  the  real  and  imaginary  components  of  the  corre¬ 
sponding  complex  vectors. 

'This  work  was  supported  in  part  by  the  U.S.  National  Science 
Foundation  under  Grant  CCR-99-80590. 


The  basic  idea  of  M-estimator-based  multiuser  detection  is 
to  detect  the  symbols  in  (2)  by  first  estimating  the  vector  9(i), 
and  then  extracting  symbol  estimates  from  these  continuous 
estimates.  The  required  estimates  of  8(i)  are  obtained  by. 
using  an  estimator  of  the  class  of  M-estimators  proposed  by 
Huber.  These  estimators  minimize  a  function  p(-)  (called  the 
penalty  function )  of  the  residuals: 


8(i)  =  arg  min  >  p 
$(i)eR2K  ^ 

j=i 


>'(*) 


(3) 


where  rj(i)  and  9k(i)  are  the  j — th  and  the  k— th  element  of 
the  vectors  r(i)  and  9(i),  respectively,  and  hjk  is  the  j,k— th 
element  of  the  matrix  H_.  Given  such  an  estimator,  the  de¬ 
tected  symbols  are  given  by  bi(i)  =  sgn  j  S?  ^O|(i)0,  ( i  —  l)j  j, 

where  SZ[]  denotes  real  part  and  9i{i)  =  &i{i)  +  j9i+i<(i). 

The  asymptotic  probability  of  error  for  large  processing 
gain  ( N  —y  oo)  for  the  M-estimator-based  multiuser  detectors 
can  be  obtained  taking  into  account  that  under  certain  regu¬ 
larity  conditions,  the  M-estimators  defined  by  (3)  are  consis¬ 
tent  and  asymptotically  normal.  Specifically,  the  asymptotic 
probability  of  error  for  the  l— th  user  can  be  shown  to  be 


Pi(e) 


f 

V 


pi 


1  + 


aSNRi 


(4) 


where  pi  is  the  Z — th  channel  fading  correlation  coefficient, 
v2  =  E  [ip2(x)]  /E2  [ip'(x)]  with  0(.)  =  p'(  ),  [fi*-1] 

denotes  the  ZZ — th  component  of  the  inverse  of  the  cross¬ 
correlation  matrix  of  the  random  infinite-length  signature 
waveforms  of  the  K  users,  a 2  represents  the  variance  of  the 
in-phase  and  quadrature  components  of  the  noise  samples  and 

SNRi  =  E  [|  gi(i)  |2]  /cr2.  Prom  (4)  it  follows  immediately 
that  the  asymptotic  error  rate  of  the  linear  decorrelating  de¬ 
tector  (i.e. ,  ip{x)  =  x)  with  DPSK  depends  on  the  noise  distri¬ 
bution  only  through  its  variance  (as  one  would  expect).  More¬ 
over,  for  sufficiently  high  values  of  SNR,  (4)  suggests  that  the 
M-decorrelators’  performance  present  an  error  floor  that  de¬ 
pend  mainly  on  the  fading  correlation  coefficient.  Specifically, 
the  slower  the  fading  rate,  the  lower  the  error  floor. 
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Abstract  —  The  large  system  limits  of  the  probabil¬ 
ity  of  error  for  multiuser  Decision  Feedback  Detectors 
(DFDs)  are  evaluated  for  synchronous  CDMA.  Both 
Successive  (S-)  and  Parallel  (P-)  DFDs  are  consid¬ 
ered,  where  the  filters  are  optimized  according  to  the 
Minimum  Mean  Squared  Error  (MMSE)  criterion.  A 
comparison  of  the  analytical  results  with  simulation 
shows  that  the  large  system  results  accurately  predict 
the  performance  of  systems  with  spreading  gain  >  128 
over  a  wide  range  of  error  rates.  The  results  also  show 
that  the  P-DFD  error  rate  approaches  the  single-user 
bound  at  high  Signal-to-Noise- Ratios  (SNRs). 

I.  Summary 

The  multiuser  DFD  can  potentially  achieve  higher  spectral 
efficiencies  than  the  linear  MMSE  receiver  with  little  added 
complexity  [1],  Here  we  are  interested  in  the  average  perfor¬ 
mance  (error  probability)  of  the  multiuser  DFD  for  uncoded 
synchronous  CDMA,  where  the  average  is  over  randomly  as¬ 
signed  signature  sequences.  Even  with  fixed  signature  se¬ 
quences,  an  exact  analysis  of  error  probability  for  a  finite-size 
system  is  difficult  due  to  error  propagation.  Averaging  over 
the  signatures  further  complicates  the  problem.  Our  approach 
is  to  study  the  large  system  limit  of  error  probability,  where 
this  limit  is  defined  by  letting  the  number  of  users  K  -4  oo 
and  the  processing  gain  N  — t  oo  with  K/N  —  a  fixed.  This 
approach  has  been  used  in  [2]  to  evaluate  the  performance  of 
linear  MMSE  receivers. 

We  assume  the  standard  baseband  CDMA  model  in  which 
the  received  vector  corresponding  to  symbol  i  is 


Our  interest  is  in  computing  the  probability  of  error  Pe  = 
Pr  {bk?bk}. 

We  show  that  as  ( K,N )  —¥  oo,  Pe  for  the  P-DFD  ap¬ 
proaches  the  limit 

Pp- DFD  =  Q  (  ,  — .  (3) 

V\/4PlinQ:  +  <r&J 

where  Pun  is  the  large  system  probability  of  error  for  the  linear 
MMSE  receiver. 

For  the  S-DFD,  users  are  decoded  and  cancelled  succes¬ 
sively.  Hence,  Pe  depends  on  the  user  index  k.  To  obtain  the 
large  system  limit,  we  let  x  —  k/K ,  where  k  —  xK  ->  oo  and 
0  <  x  <  1.  The  performance  of  user  k  +  1  depends  on  the 
performance  of  users  1, . . .  ,k.  Taking  the  large  system  limit 
gives  the  following  expression  for  Ps.Dfo(x  +  dx)  in  terms  of 
Ps-dfd(u),  0  <  v  <  x, 


Ps-dfd(z  +  dx)  =  1  -  Q 


-7-i 


(4) 


where  7 i  =  /  (x+t2);  dG(X),  G( A)  is  the  asymptotic  eigenvalue 
distribution  of  the  covariance  matrix  Ry  =  PyP^,  where  the 
columns  of  Py  are  the  spreading  codes  for  the  undetected  users 
k  +  1  <  m  <  K,  and 


Ps-dfd(v)  dv. 


The  boundary  condition  is 


(5) 


Ps-dfo(0)  —  Pun  (a) 


(6) 


r(i)  =  Pb(i)  +  n(i)  (1) 

where  P  =  [pi,  -  ,pk]  is  the  N  x  K  matrix  of  random 
spreading  codes  with  i.i.d.  elements,  b(i)  is  the  vector  of  bi¬ 
nary  symbols  across  users,  and  n  is  the  noise.  Here  we  assume 
that  the  users  are  received  with,  equal  power,  although  our  re¬ 
sults  can  be  generalized  to  arbitrary  power  distributions. 

The  output  of  the  DFD  is 

y(0  =  Ftr(i)  -  B+b(i)  (2) 

where  b(i)  is  the  vector  of  tentative  decisions,  and  F  and  B 
are  the  feedforward  and  feedback  filters,  selected  to  minimize 
the  Mean  Squared  Error  jE7[[|b  —  y||2].  For  the  P-DFD,  the 
tentative  decisions  b(z)  =  sgn|F^nr(i)|  where  Fh„  is  the 
linear  MMSE  filter,  and  the  final  decision  b(i)  =  sgn{y(i)}. 

1This  work  was  supported  by  ARO  under  grant  DAAD19-99-1- 
0288. 


that  is,  Pe  for  the  linear  MMSE  receiver  with  load  a.  The  rela¬ 
tions  (4)-(6)  can  be  numerically  integrated  to  obtain  Ps-dfd(:e) 
for  0  <  x  <  1. 

Comparisons  with  simulation  results  show  that  for  Pe  > 
10“ 3 ,  the  laxge  system  results  accurately  predict  the  perfor¬ 
mance  of  a  finite-size  system  with  N  >  32.  At  high  ^-’s  (>  10 
dB),  the  S-DFD  can  achieve  close  to  ideal-feedback  perfor¬ 
mance  for  loads  up  to  f.  At  low  loads  (a  <  0.5),  a  lower 
(around  6  dB)  suffices.  For  loads  a  <  1,  the  P-DFD  achieves 
near  single-user  performance  at  a  sufficiently  high  ^ . 
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Abstract  —  Asymptotic  bounds  are  derived  involv¬ 
ing  second  additive  term  of  order  In  a|  asa-rO  for 
the  mean  length  of  a  controlled  sequential  discrimi¬ 
nation  strategy  s  between  two  statistical  models  in  a 
general  non-parametric  setting.  The  parameter  a  is 
the  maximal  error  probability  of  s. 

These  results  are  applied  to  change-point  detection. 

I.  Introduction 

The  aim  of  the  present  report  is  three-fold: 

1.  to  construct  second-order  optimal  sequential  strategies 
strengthening  the  traditional  ones; 

2.  to  do  this  for  a  non-parametric  setting  with  control,  indif¬ 
ference  zone  and  general  loss  of  power  growth  at  infinity; 

3.  to  apply  our  constructions  for  change-point  detection. 

We  continue  [1]  devoted  to  first  order  optimal  tests. 

Let  (X,B,p),  X  C  R,  be  a  probability  space,  (V,d(  )) 
be  a  metric  space,  where  P  is  a  subset  of  the  set  A  of  mu¬ 
tually  absolutely  continuous  probability  measures.  The  set 
P  is  partitioned  into  sets  Po,  Vi  and  the  indifference  zone 
V+  =  V  \  (Pi  U  Po)  such  that  the  distance  between  Vo  and  Pi 
is  positive.  We  test  Ho  :  P  €  Po  versus  Hi  :  P  6  Pi,  every 
decision  is  good  for  P  €  P+. 

For  an  a  >  0  introduce  a-  strategies  s  satisfying  condition: 
maxr=o,i  supP67V  Pp(<5  =  1  —  r)  <  a. 

Define  z(P,  Q,x)  =  log(^(a:)),  I{P,Q)  =  E Pz(P,Q,x), 
I{P,U)  =  inf  q€kI(P,Q)  for  KcP;  A(P)  =  Vi-T  for  P  € 
Pr  as  the  alternative  set  in  P  for  P.  For  P  €  P+,  if  I(P,  Vo)  < 
7(P, Pi),  then  A(P)  =  Pi,  otherwise,  A(P)  =  Vo-  Finally 
c(P)  =  I(P,A(P))-1. 

For  a  mixed  control  u  =  (/ti , . . . ,  «m)  on  U  —  {1, . . . ,  m}, 
Pu  is  a  mixture  of  measures  Pu,u  €  U,  while  Vjf,k  =  0,1, 
are  sets  of  corresponding  distributions  from  A  with  a  positive 
distance  between  them. 

Define  7U(P,  Q)  =  £)ili  mI(P' ,Q')  and  introduce 
k’(P)  =  maxugf/- Ju(P,Au(P)),c*(P)  =  k’(P)~\  and  let 
u*  =  u*  (P)  be  a  control  such  that  k’(P)  =  /„•  (P,  Au*  (P)). 

II.  Non-parametric  Second  Order  Bounds 

Cl.  There  is  c  >  0  such  that  for  all  P  £  V,Q  £  P,  and 
u  €  U  Ep  ( z{Pu,Qu,X ))2  <c. 

C2.  There  exist  t  >  0  and  /  >  0  such  that 

Ep  (sup^g-p  exp (~tz(P,  Q,  X)))  <  /  for  all  u  €  U  and  P  €  P. 
C3.D  =  fx  z i(x)  (a(x)b(x))1/2  dx  <  oo,  where 

zi(x)  =  supQrP  |  1 .  SUP pep/_ooP(^(d<)  ^  a(x)> 

suppe7,  /“  p(t)p(dt)  <  b(x). 

C4.  There  exist  b  >  0  and  K\  =  K\  (b)  such  that  for  every 
n  the  estimate  p  =  pn  for  the  density  function  of  i.i.d.(P) 
observations  Xi,...,X„  exists  with  E p(/(P, P))  <  K\n~b . 


Theorem  1.  i.  Under  the  condition  Cl  every  a-strategy  s 
for  the  no-control  problem  satisfies 

EpN  >  c(P)|  log  a |  +  O  (V|  loga|)  (1) 

for  every  P  €  P. 

ii.  For  controlled  experiments  and  every  P  €  P  the  inequality 
(1)  holds  with  c*(P)  substituted  instead  of  c(P). 

Theorem  2.  For  every  P  £  P  under  the  conditions  Cl- 
C4  with  b  >  |  there  exists  an  a-strategy  s*  such  that  Ep  N  < 

c(P)|loga|  +  0^v/|  loga|)  . 

III.  Non-parametric  Testing  with  Control 

C5.  Suppose  a  sequence  of  mixed  controls  un(P),  c  > 
0  and  K2  =  70(c)  exist  such  that  u„(P„)  is  a  measurable 
control  for  every  n  and  i.i.d.(P)  observations  Xi, . . . , X„,  P  € 
P  satisfying  Ep|JUn(pii)(P,  AU.(P)(P))  -  fc*(P)|  <  K2n~c. 

Theorem  3.  Under  the  conditions  C1-C5  there  ex¬ 
ists  an  a-strategy  S’  such  that  Ep  N  <  c*(P)|loga|  + 

O  (| loga|1-min(t'c)/2)  +  O  (\/l log«l)  for  every  P  6  V- 

IV.  Change-Point  Detection 

Our  strategy  and  bounds  appear  well-applicable  for  a  non- 
parametric  detection  of  abrupt  change  in  the  distribution  of 
i.i.d.  sequence  without  an  indifference  zone.  We  use  the 
methodology  outlined  in  [2]. 

Let  the  observations  Xi, . . .  ,X„, . . .  be  independent,  and 
for  n  <  v  all  have  a  distribution  Po  €  Vo,  while  all  the  Xn 
with  n  >  v  have  an  unknown  distribution  Pi  £  Pi,  where 
v  >  1  is  an  unknown  integer,  and  1\  =  I(Pi,Po)- 

Let  AT  be  a  change-point  estimate,  and  a+  =  a,  if  a  >  0, 
and  a+  =  0  otherwise.  Following  [2]  we  use  the  functional 
E’(N)  —  sup,,;,!  esssupE^  ((IV  -  u  -f  1)+  |Xi, . . .  ,X„  ) 

with  Pi  £  Pi  suppressed  as  the  optimality  criterion  of  the 
strategy  s  under  the  restriction  Ep  (N)  >  a-1. 

Theorem  4.  Under  the  condition  Cl  for  every  a-strategy 
and  sufficiently  small  a  the  following  lower  bound  is  valid 
E 3{N)  >  |  log  q|/i_1  +  0(log  |loga|) . 

Under  the  conditions  Cl-Cf  there  exists  a-strategy  s’  such 

that  E 3‘{N)  <  |  log  o|/i_1  +0  (|  log  q|1— ^/2)  -t-0  . 
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Abstract  —  In  this  paper  we  consider  the  con¬ 
strained  ML  problem  where  the  solution  vector  is  re¬ 
stricted  to  lie  within  a  convex  set.  An  iterative  al¬ 
gorithm  for  solving  the  convex-constrained  ML  prob¬ 
lems  is  derived  where  special  cases  correspond  to  mul¬ 
tistage  interference  cancellation  structures. 

I.  Summary 

The  detection  strategy  usually  denoted  optimal  multiuser  de¬ 
tection  is  equivalent  to  the  solution  of  a  (0,l)-constrained 
maximum-likelihood  (ML)  problem,  a  problem  which  is  known 
to  be  NP-hard.  Here  we  relax  the  constraints  imposed  on 
the  solution  vector  in  order  to  formalise  lower  complexity  ML 
detectors  [1].  Consider  a  A- user  asynchronous  CDMA  sys¬ 
tem  with  additive  white  Gaussian  noise  (AWGN)  of  variance 
a2  =  No/2.  Each  user  transmits  a  block  of  L  data  symbols 
using  BPSK.  A  minimal  set  of  sufficient  statistics  of  dimen¬ 
sion  LK  is  obtained  through  filtering  matched  to  the  received 
spreading  code  of  the  desired  user  y  =  Rd  +  z  where  y  is  the 
matched  filter  output  vector  and  R  is  the  correlation  matrix. 

The  general  constrained  ML  problem  for  asynchronous 
CDMA  is  described  as 

u  =  arg  min  F(v)  =  arg  min  ^vTRv  -  vTy.  (1) 

For  a  closed  convex  set  constraint,  an  iterative  algorithm  for 
solving  the  problem  in  (1)  can  be  found  by  considering  a  vari¬ 
ational  inequality  (VI)  problem  which  has  the  same  solution 
as  (1). 

Definition  1  The  VI  problem  VI(G,  C)  is  defined  as  finding 
a  vector  u  G  C  C  R1'^  such  that 

(v  -  u)TG(u)  >  0,  VveC,  (2) 

where  G  is  a  given  continuous  function  from  C  to  RiK  and  C 
is  a  given  closed  convex  set. 

Letting  f(v)  be  the  derivative  of  F(v),  the  solution  to  the 
problem  is  described  in  the  following  lemma. 

Lemma  1  (Lemma  3.1  [2])  Let  C  be  a  closed  convex  set, 
and  let  f(u)  be  a  continuous  function.  Then  u  is  a  solution  to 
(v  —  u)Tf(u)  >  0,  for  all  v  £  C,  if  and  only  if 

u  =  Pc(u  —  wf(u))  for  some  or  all  ui  >  0.  (3) 

In  Lemma  1,  -Pc(y)  is  an  orthogonal  projection  onto  the  con¬ 
straining  set,  Pc(y)  =  arg  minxes  ||x  -  y||.  The  following  it¬ 
erative  algorithm  is  proposed  in  [2]  for  solving  a  VI  problem. 
Conditions  for  convergence  are  considered  in  [1]. 


Algorithm  1  For  any  initial  value  uo  €  C,  let 

um+i  =  pPc[um  -  wE(Rum  -  y  +  K(um+i  -  um))] 
+(l-p)um,  (4) 

where  0  <  p  <  1,  ui  >  0,  E  is  any  positive  diagonal  matrix, 
and  m  =  0, 1, . . .  ,  M  —  1  is  the  iteration  index.  If  the  orthog¬ 
onal  projection  operation  can  be  decoupled  into  independent 
element-wise  projections,  then  K  can  be  either  strictly  lower 
triangular,  strictly  upper  triangular  or  equal  to  the  null  matrix 
0.  Otherwise,  K  must  be  equal  to  the  null  matrix  0. 

Algorithm  1  has  the  form  of  generalised  interference  cancel¬ 
lation.  The  algorithm  is  serial  (successive)  in  nature  when  K 
is  strictly  upper  or  lower  triangular.  When  K  =  0,  the  algo¬ 
rithm  becomes  parallel  in  nature,  as  iteration  m  only  depends 
on  estimates  from  iteration  m  —  1.  Let  R  be  partitioned  such 
that  R  =  D  +  L  +  U,  where  D  is  a  diagonal  matrix  and  L  and 
U  are  strictly  lower  and  upper  triangular,  respectively.  Spe¬ 
cial  cases  of  Algorithm  1  Eire  equivalent  to  known  successive 
and  parallel  interference  cancellation  structures.  An  M- stage 
successive  interference  cancellation  (SIC)  scheme  is  described 
as 

Um+1  =  Pc[D_1(y  —  Lum+i  —  Uum)],  (5) 

where  uo  =  y.  The  relationship  between  the  SIC  and  the 
general  iterative  algorithm  is  clear  from  substituting  K  =  L, 
E  =  D_1  and  p  —  u>  =  1  into  (4).  Similarly,  the  M- stage 
weighted  parallel  interference  canceller  (PIC)  is  described  as 

Um+i  =  Pc[wD-1y  +  (I  -  wD~1R)um],  (6) 

which  can  be  derived  from  (4)  by  taking  K  =  0,  E  =  D_1 
and  p  =  1.  Again  uo  =  y. 

The  orthogonal  projection  operation,  essential  for  the  algo¬ 
rithm,  clearly  corresponds  to  the  tentative  decision  function 
in  a  cancellation  structure.  For  a  box  constraint,  the  orthog¬ 
onal  projection  operation  can  be  decoupled  into  independent 
element-wise  orthogonal  projections.  Element  i  of  the  orthog¬ 
onal  projection  vector  is  in  this  case 


f  Vi 

& 

V 

i- 1 

I 

Vh 

(My)h  =  <  -i 

if  t/i  <  -1 

l  +i 

if  Ui  >  1 

The  function  Pb(y)  is  incidently  identical  to  the  clipped  soft 
decision  function  suggested  for  interference  cancellation  [1], 
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Abstract  -  A  necessary  and  sufficient  condition  for  the  equivalence  of 
two  test  functions  is  stated,  also  a  necessary  and  sufficient  condition 
for  the  existance  of  UMP  test  is  proposed.  Then  it  is  shown,  for  the 
first  time,  that  in  coherent  radar  detection,  UMP  test  does  not  exist 
and  ALR  and  GLR  detectors  are  not  equivalent. 

I-  Introduction 

In  several  detection  problems,  we  have  a  composite  hypothesis  test,  i.e.a 
test  that  at  least,  a  parameter  characterizing  one  of  the  hypothesis  is 
unknown, 

first,  we  have  discussed  about  the  meaning  of  equivalence  of  two  tests  and 
have  stated  a  necessary  and  sufficient  condition  for  equivalence  of  two 
tests  in  scaler  and  vectorial  cases  and  then  we  have  discussed  about  the 
existance  of  UMP  test  in  two-  sided  hypothesis  tests  and  have  found  a 
necessary  and  sufficient  condition  for  the  existanse  of  UMP  test  and  then 
by  using  of  theorems  we  have  shown  that  in  the  coherent  radar  detection  in 
gaussian  noise,  UMP  test  does  not  exist  and  ALR  and  GLR  detectors  are 
not  equivalent. 

II  -  Equivalence  of  two  tests 

In  this  section,  we  propose  the  concept  of  equvialence  of  two  tests  and 
conditions  of  this  equivalence.  The  test  which  result  from  the  comparison 
of  a  function  of  observation  (for  example  A(x))  with  a  thersho!d(^  )  is 
represented  here  by  T(A),  i.e.the  test  T(A)  is  a  test  with  decision  rule  of 

A(X)Ih0'°- 

Definition:  two  tests  are  equivalent  iff  they  have  the  same  critical  region. 
Theorem  (1):  If  two  test  T(A)  and  T(B)  are  equivalent  then: 
\fx},x2:A{x1)=A(x2)oB(xIy-B(x2)  ( l ) 

On  the  other  hand,  if  functions  A  and  B  are  continuous  and  equation(l)  is 
satisfied  for  them,  the  test  T(A)  will  be  equivalent  either  with  T(B)  or  with 
T(-B)  [I]. 

Remark  (1):  Theorem  (1)  is  valid  also  when  we  deal  with  the  vector  form 

of  observations  (e.g.  *  (*  1^  la  )  [1], 

III  -  UMP  test  in  two  -  sided  hypothesis  tests 
UMP  test  is  the  most  powerful  test  in  the  test  class  of  its  size. 
Unfortunately,  in  general  there  is  not  a  theorem  in  literature  that  answers 
to  this  question  that  whether  the  UMP  test  exists  in  a  given  problem  or 
not?  In  this  section,  we  discuss  about  this  problem  in  two  -  sided 
hypothesis  tests  and  propose  a  way  to  check  for  the  existance  of  the  UMP. 
Iemma(l):  Necessary  and  sufficient  condition  for  the  existance  of  UMP 

\n0-.6=e0 

test  in  a  two-sided  hypothesis  test  (e  g.  \H  t:O*0o  )  is  that  the 
M<;0> 

likelihood  ratio,  i.e.  LR(t,0)  =  Lfl  g  ^  has  two  following  properties[I]: 

a)  For  any  e  ,  solution  ofLR(ij  ;  6)  =  LR(t  ^  ;  8)  is  independent  of  «  . 

SLR  (  i  -e  ) 

b)  For  any  t,  sign  of -  is  independent  ofe  . 

dl 

/<*;») 

Lemma  (2):  If  in  atest,  likelihood  ratio  is  in  the  form  of  LR<iX_'e_'>= j(x  60) 

(in  which  x  and  0  are  n-dimensional  and  k-dimensiona!  vectors, 
respectively),  necessary  and  sufficient  condition  for  existance  of  UMP  test 
in  two-sided  hypothesis  test,  is[l]: 

1)  For  any  e  ,  solution  of  LR  <  xi  A  ( x2  -,e  >  js  independent  of  $  : 

SLR  <  i  ,e  ) 

2)  For  ,  sign  of  — -  is  independent  of  e  . 

IV-  Applications  to  coherent  radar  detection 
Example  (1):  In  the  detection  of  coherent  radar  signals  with  unknown 
Doppler  shift,  we  have  the  following  hypothesis  test: 

\H0-x_=n_ 

1  H  j.x=s+n  (2) 


where  x  ,  s  and  n  are  N-tuple  vectors  of  observed  data,  signal  and  noise, 
respectively.Noise  is  assumed  to  be  zero-mean  gaussian,  and 
s=v  exp(  /V  )5  m 


where  v  is  the  amplitude,  and  0  is  the  initial  phase  of  the  signal  and 
S^lc>ne3jn >  where  n  is  the  normalized  Doppler 
frequency. 

The  vector  S  is  considered  as  an  unknown  vector  and  we  can  rewrite  the 


problem  of  hypothesis  test  as: 
f  Hg  s  =  0 

j  H  j  :  a  *  0 


The  likelihood  ratio  equals[2] 

f(x\s )  r 

LR(x,s)= — — -r— =exp  -s 
--  f(x-fi)  P[  _ 


H 


I-„G+2Re(xW 


(4) 


wherei„„  is  the  noise  covariance  matrix.  According  to  Lemma(2),  if 
the  UMP  test  exists  then  the  solution  of  the  following  equation  should  be 
independent  of  s . 


LR(XJ  ;.v  )=LR(x2 ;.v  )=»Re(  xH  Z~„’„  ,v )=Rc( xH  .v ) 
-1  -  -2 


(5) 


But  if  *  =  1  *Jlnn  ■'  is  selected,  then  this  solution  which  is  dependent  of 

s  satisfies  the  eq.(5).  Therefore,  the  UMP  test  dose  not  exist. 

Example(2):  Here,  we  suppose  that  parameters  A  t  and  V  in  example(l), 
are  unknown  but  t?and  are  random  variables  with  uniform  distribution 


in  the  interval  )  and  V  has  Rayleigh  distribution  with  paramater  a, 
hence  we  can  use  ALR  detector  in  this  problem.  ALR  and  GLR  detectors, 
for  this  problem,  are  in  the  following  form  [2  -  4]: 


lr  (x,n  )= 


_ i_ 

l  +  2aSH 


2 

2°  *_H  Z  nnS_ 
l+2aSH  Z-’S  ) 


(6) 


A(x) 


r 


LR(x,  a  )dC2  ,  G  (x) 


max  a  [LR(  x,  O  )]  (7) 


If  it  is  supposed  that  N=2,  )  andl„„  =  I.  Above  detectors  will 


be  in  the  following  form: 


A(^2^)'At^X,^X2'[2)]\ 

g<!)=Jw7”p [y+7a(|lr/ l+lx;l  )2] 


h  4  a 


(8) 

(9) 


where  I„  (.)  is  the  modified  Bessel  function.  We  see  that  G(xi,x2)  is  costant 
over  [+|jc2 |=  cte,  but  A(xi,x2)  is  not  costant  over  |-t/|+|*2|=  cte, 
therefore  eq.(l)  is  not  satisfy  and  consequently  GLR  and  ALR  detector  are 
not  equivalent. 
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Abstract  —  Approximate  simultaneous  orthogonal 
expansions  of  two  second-order  stochastic  processes 
are  defined  and  their  convergence  is  showed.  The 
technique  is  based  on  the  Rayleigh-Ritz  method  to 
solve  the  homogeneous  equation  involving  both  co- 
variance  kernels  simultaneously.  Finally,  two  specific 
applications  of  these  finite  expansions  are  given:  in 
the  Gaussian  estimation  and  detection  problems. 

I.  Introduction 

This  paper  is  concerned  with  the  simultaneous  diagonalization 
of  two  covariance  kernels  and  its  application  to  second-order 
stochastic  processes.  Several  approaches  have  been  developed 
to  expand  two  processes  simultaneously  (e.g.,  [1]— [3]).  These 
methodologies  allow  both  processes  to  be  expanded  in  the 
same  set  of  functions  with  uncorrelated  coefficients.  A  number 
of  applications  of  such  expansions  can  be  found  in  the  litera¬ 
ture.  For  example,  in  [1]  an  extension  of  Shannon’s  definition 
for  the  information  content  of  a  Gaussian  process  in  Gaussian 
noise  is  provided,  Kadota  gives  a  solution  to  the  problem  of 
estimating  a  Gaussian  signal  in  noise  [4],  and  Root  [2]  and 
Pitcher  [5]  apply  them  in  the  Gaussian  detection  problem. 

From  the  practical  standpoint,  simultaneous  orthogonal  ex¬ 
pansions  of  two  stochastic  processes  are  very  limited  because 
there  is  no  standard  method  to  find  the  eigenvalues  and  eigen¬ 
functions  of  the  operator  involved.  Kadota  gives  two  examples 
illustrating  an  indirect  method  to  obtain  the  expansion  coeffi¬ 
cients  and  the  expanding  functions  [6].  However,  this  method 
requires  computing  the  set  of  eigenfunctions  of  a  covariance 
kernel,  what  is  generally  a  difficult  task  and,  sometimes,  im¬ 
possible. 

Our  aim  is  to  provide  a  methodology  overcoming  the  dif¬ 
ficulty  of  computing  the  true  simultaneous  eigenvalues  and 
eigenfunctions.  For  this  purpose,  we  will  apply  the  Rayleigh- 
Ritz  method  to  solve  numerically  the  homogeneous  equation 
involving  both  covariances  simultaneously.  The  result  ob¬ 
tained  is  an  approximate  procedure  allowing  the  simultane¬ 
ous  diagonalization  of  two  covariance  kernels  and,  as  a  conse¬ 
quence,  the  definition  of  the  so-called  approximate  simultane¬ 
ous  orthogonal  expansions  of  two  stochastic  processes. 

II.  Approximate  Simultaneous  Orthogonal 
Expansions 

Since  the  most  general  solution  for  expanding  two  stochas¬ 
tic  processes  simultaneously  is  given  in  [3],  we  consider  this 
approach  as  the  basis  of  the  present  paper.  We  assume  the 
conditions  on  the  two  covariance  kernels  imposed  in  the  above 
work  and  the  additional  assumption  of  imperfect  detection 

[2],  Thus,  it  can  be  shown  that  the  approximate  simultane¬ 
ous  eigenvalues  and  eigenfunctions,  computed  by  applying  the 


Rayleigh-Ritz  method  to  the  associated  homogeneous  equa¬ 
tion,  converge  to  the  true  ones.  This  result  yields  a  way  of 
approximating  the  expanding  functions  and  the  expansion  co¬ 
efficients.  Moreover,  it  allows  us  to  obtain  an  approximate 
method  to  diagonalize  both  covariances  simultaneously  and 
also,  to  define  the  approximate  simultaneous  orthogonal  ex¬ 
pansions  of  two  stochastic  processes.  The  convergence  of  these 
finite  expansions  are  also  shown.  The  results  remain  valid  un¬ 
der  differentiation.  To  conclude  this  section,  the  implementa¬ 
tion  of  the  method  is  illustrated  by  considering  an  example. 

III.  Applications 

The  first  application  considered  is  concerned  with  the  problem 
of  estimating  the  mth  quadratic-mean  derivative  of  a  Gaus¬ 
sian  signal  in  independent  Gaussian  noise.  On  the  basis  of 
Kadota’s  results  [4],  a  suboptimal  estimate  approaching  the 
optimal  estimate  as  the  length  of  the  series  goes  to  infinity 
is  proposed  and  an  expression  of  its  error  variance  is  given. 
Furthermore,  it  is  shown  that  the  sequence  of  mean-squared 
estimation  errors  resulting  from  the  suboptimal  estimate  con¬ 
verges  to  the  minimum  error  resulting  from  the  optimal  one. 

The  second  application  addresses  the  Gaussian  nonsigular 
detection  problem.  Specifically,  we  propose  an  approximate 
log-likelihood  ratio  derived  from  the  above  finite  expansions, 
which  converges  to  the  optimum  detection  statistic  [2].  The 
advantage  of  such  approaches  is  that  they  are  computationally 
feasible. 
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I.  Extended  Summary 

We  introduce  and  solve  a  seemingly  basic  geometrical  ex¬ 
tremal  problem. 

The  set  of  vertices  of  weight  w  in  the  unit  cube  of  R" 

E(n ,  w)  =  {a;n  g  {0,  l}n  :  xn  has  w  ones} 

can  also  be  viewed  as  the  set  in  which  constant  weight  codes 
are  studied  in  Information  Theory.  Another  interest  there  is  in 
linear  codes.  This  was  a  motivation  for  studying  the  interplay 
between  two  properties:  constant  weight  and  linearity.  In  par¬ 
ticular  we  wanted  to  know  M(n,k,w)  =  max{|Lr£n£(n,  ur)|  : 

U£  is  a  fc-dimensional  linear  subspace  of  Mn},  that  is,  the 
maximal  cardinality  of  a  set  of  vectors  in  E(n,  w),  whose  linear 
span  has  a  dimension  not  exceeding  k.  Here  is  our  complete 
solution. 

Theorem.  Let  w  <  \  and  k  <  n,  then 

(a)  M(n,k,w)  =  M(n,k,n  -  w) 

fO.  if(i)2w<k 

(b)  M(n,k,w)  =  <  (2£;uw))22“’-\  if  (ii)  k  <  2w  <  2 (k  -  1) 

if  (in)  2(k  -  1)  <  2w  <  n. 

The  key  sets  giving  the  lower  bounds  for  M  ( n ,  k,  w )  in  the 
three  cases  are 

(i)  <Si  =  E(k,w)  x  {0}n“fc 

(ii)  S2  =  E(2{k  -  w),  k  -  w)  x  {10, 01}2u'-fc  x  {0}n-2u’ 

(iii)  53  =  {10, 01}*-1  x  {l}u’-*:+1  x  {0}n-u'. 

We  also  present  an  extension  to  multi-sets  and  explain  a 
connection  to  the  (simpler)  Erdos-Moser  problem. 
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Abstract  —  We  present  new  upper  bounds  on  the 
size  of  constant-weight  binary  codes,  derived  from 
bounds  for  spherical  codes.  In  particular,  we  improve 
upon  the  1962  Johnson  bound  and  the  linear  program¬ 
ming  bound  for  constant-weight  codes. 

I.  Introduction 

An  ( n ,  d,  w)  constant-weight  code  is  a  binary  nonlinear  code 
with  length  n  and  minimum  Hamming  distance  d,  where  all 
codewords  have  the  same  number  of  ones,  w.  The  maxi¬ 
mum  size  of  such  a  code  is  denoted  A(n,d,w).  The  value  of 
A(n,  d,  w)  is  in  general  not  known,  but  a  number  of  lower  and 
upper  bounds  have  been  established.  See  [2-4]  for  summaries 
of  the  best  bounds  known  today. 

The  new  bounds  presented  here  are  based  on  concepts  from 
Euclidean  geometry,  in  particular,  spherical  codes.  An  (n,  s) 
spherical  code  is  a  set  of  points  on  the  n-dimensional  unit 
sphere  such  that  the  inner  product  of  any  two  points  is  at 
most  s.  Its  maximum  size  is  denoted  by  As(n,  s). 

II.  Improved  Johnson  Bound 

Through  an  elementary  mapping  from  binary  space  to  Eu¬ 
clidean  space,  we  obtain  the  following  upper  bound.  It  is 
equivalent  to  the  well-known  Johnson  bound  from  1962  [2]  for 
b  >  8/(n  +  1)  and  improves  on  it  for  0  <  6  <  8/(n  +  1). 

Theorem  1.  Let  b  =  5  —  w(n  —  w)/n.  Then 

A(n,25,w)  <  [5/bJ ,  ifb>8/n 

A(n,28,w)  <  n,  if0<b<8/n 

A(n,28,w)  <  2n  —  2,  if  6  =  0 

Proof:  Consider  any  constant- weight  code  ^  with  param¬ 
eters  (n,  25,  w)  and  map  it  into  Euclidean  space  by  replacing 
the  binary  components  0  and  1  with,  respectively,  1  and  —1. 
After  translation  and  scaling,  this  yields  an  (n  — l,s)  spherical 
code,  where  s  —  1  —  8n/(w(n  —  w)).  Since  its  size  is  upper- 
bounded  by  As  (n  —  1,  s),  so  is  the  size  of  .  Applying  known 
values  of  As(n  —  1,  s )  for  s  <  0  [1]  completes  the  proof.  | 

Some  values  of  A(n,d,w)  for  which  Theorem  1,  in  con¬ 
junction  with  known  lower  bounds  [3],  yields  previously  un¬ 
known  exact  values  are  A(20, 10, 9)  =  20,  A(21, 10, 8)  =  21, 
A(24, 10,  7)  =  24,  A(24, 12, 11)  =  24,  A(26,12,9)  =  26,  and, 
somewhat  surprisingly,  .4(28, 14, 12)  =  .4(28, 14, 13)  =  28. 

III.  Improved  Linear  Programming  Bound 

The  distance  distribution  of  any  binary  code  is  defined 
as  Ai  =  M  £c€tf  Kc>  g  V  I  rf(c> c')  =  »}|  for  t  =  0, . . .  ,  n, 
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dation  and  by  the  David  and  Lucile  Packard  Foundation. 
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where  d(-,  ■)  denotes  the  Hamming  distance.  The  linear  pro¬ 
gramming  bound  for  a  constant-weight  code  with  w  <  n/2  is 
A(n,  25,  w)  <  1  4-  max  YS‘Ls  A2. ,  where  the  maximum  is  taken 
over  all  {A;}  that  satisfy  certain  well-known  constraints  [2]. 

We  propose  an  additional  constraint  in  the  maximiza¬ 
tion,  which  sharpens  the  bound.  In  the  following  theorem, 
T'(wi,ni,W2,n2,d)  and  T(un,  m,  W2,  n2,  d)  denote  the  max¬ 
imum  size  of  an  (ni  +n2,d,wi  +  m2)  constant- weight  code 
in  which  the  number  of  ones  in  the  first  n\  positions  of  all 
codewords  is,  respectively,  at  most  w\  and  exactly  w\ . 

Theorem  2.  For  all  i,  j  €  (5, 5  +  1, . . .  ,  iu}  with  i  ^  j, 

PjiA2i  +  (Pi  —  P\j  )  A2;  <  PiPji,  if  Pij/Pi  +  Pji/Pj  >  1 
(Pj  -  Pji)  An  +  PijA2j  <  PjPij ,  if  Pij/Pi  +  Pji/Pj  >  1 

PjA2i  +  Pi  A‘2j  <  P 1 P j ,  if  Pij/Pi  +  Pji/Pj  is  1 

where  Pi,  Pj,  Pij,  and  Pji  are  any  numbers  that  satisfy 

Pi  >  T(i,  w,  i,  n  —  w,  25) 

Pj  >T(j,w,j,n-w,2S) 

Pij  >  min {Pi,T'(w  —  8,  j,8  —  w  +  i,n  —  w  —  j, 

25  —  2u>  +  2i)},  ifi  +  j<n  —  8 

Pji  >  min  { Pj,T'(w  —  S,i,8  —  w  +  j,n  —  w  —  i, 

25  —  2w  +  2 j) } ,  if  i  +  j  <  n  —  8 

Pji  =  Pij  =  0,  if  i  +  j  >  n  -  8. 

The  entities  T  and  T'  can  be  upper-bounded  using  bounds 
for  spherical  codes  and  so-called  zonal  spherical  codes.  Details 
and  proofs  are  given  in  [1],  which  also  contains  several  other 
new  bounds,  a  survey  of  known  bounds  on  A(n,d,w ),  and 
updated  tables  of  A(n,  d,  w)  for  n  <  28. 

New  upper  bounds  obtained  through  Theorem  2  include 
A(20, 8, 9)  <  195,  A(21,8, 9)  <  320,  A(22,8,10)  <  641, 
A(24, 8, 11)  <  2188,  and  A(23, 10, 9)  <  81. 

References 

[1]  E.  Agrell,  A.  Vardy,  and  K.  Zeger,  “Upper  bounds  for  constant- 
weight  codes,”  IEEE  Trans.  Inform.  Theory,  submitted  Decem¬ 
ber  15,  1999,  available  online  at  uvw .  chi .  chalmers .  se/'agrell. 

[2]  M.R.  Best,  A.E.  Brouwer,  F.J.  MacWilliams,  A.M.  Odlyzko,  and 
N.J.A.  Sloane,  “Bounds  for  binary  codes  of  length  less  than  25,” 
IEEE  Trans.  Inform.  Theory,  vol.24,  pp.  81-93,  Jan.  1978. 

[3]  A.E.  Brouwer,  J.B.  Shearer,  N.J.A.  Sloane,  and  W.D.  Smith, 
“A  new  table  of  constant  weight  codes,”  IEEE  Trans.  Inform. 
Theory,  vol.  36,  pp.  1334-1380,  Nov.  1990. 

[4]  R.L.  Graham  and  N.J.A.  Sloane,  “Lower  bounds  for  constant 
weight  codes,”  IEEE  Trans.  Inform.  Theory,  vol.  26,  pp.  37-43, 
Jan.  1980. 


0-7803-5857-0/00/S1  0.00  ©2000  IEEE. 


391 


\W\W\\\ 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


On  the  Covering  Radius  of  Ternary  Negacyclic  Codes  with  Length  up 

to  26 


Tsonka  S.  Baicheva1 
Institute  of  Mathematics  and 
Informatics 

Bulgarian  Academy  of  Sciences 
P.O.Box  323 

5000  Veliko  Tarnovo,  Bulgaria 
e-mail:  lpmivtCvt.bia-bg.com 


Abstract  —  The  covering  radius  of  all  ternary  ne¬ 
gacyclic  codes  of  even  length  up  to  26  is  given.  The 
minimal  distances  and  weight  spectra  of  all  codes  were 
recalculated.  Seven  of  the  open  cases  for  the  least  cov¬ 
ering  radius  of  ternary  linear  codes  were  solved  and 
for  other  three  cases  upper  bounds  were  improved. 

I.  Introduction 

Constacyclic  codes  are  linear  codes  which  are  closed  un¬ 
der  constacyclic  shifts  of  codewords.  A  constacyclic  shift 
of  the  n-tuple  (ao,  oi , . . . ,  ari_2,  an_i)  yields  the  n-tuple 
(can-i,  ao,  ai , . . . ,  an-2),  where  c  is  a  fixed  nonzero  field  el¬ 
ement.  Constacyclic  codes  share  many  of  the  well-known  al¬ 
gebraic  properties  of  cyclic  codes  [5,  ch.  7].  In  particular,  one 
way  of  describing  a  constacyclic  code  C  is  as  an  ideal  in  the 
ring  of  polynomials  F[x]/(xn  —  c),  closed  under  polynomial  ad¬ 
dition  and  multiplication  modulo  xn  —  c.  It  can  be  shown  that 
C  is  a  principal  ideal,  and  as  such,  it  contains  a  unique  monic 
polynomial  of  minimum  degree,  denoted  g(x),  that  generates 
C,  i.e.  C  =  (g(x)).  The  generator  polynomial  g(x)  must  be 
a  divisor  of  xn  —  c,  and  the  degree  r  of  g(x)  determines  the 
redundancy  of  C,  i.e.  (g{x))  is  an  [n,n  —  r]  code.  We  will 
denote  n  —  k  by  k. 

When  c  =  —  1  codes  are  called  negacyclic  [3,  ch.  9].  In 
this  case  g(x)  is  a  divisor  of  x"  +  1.  The  polynomial  h(x)  = 
(xn  +  1  )/g(x)  is  the  check  polynomial  of  the  code  C. 

As  xn  -f  1  =  (x2n  —  l)/(xn  —  1),  roots  of  the  polynomial 
xn  +  1  are  these  roots  of  x2n  —  1  which  aTe  not  roots  of  the 
polynomial  xn  —  1.  If  cv  is  a  primitive  root  of  the  polynomial 
x2n  —  1,  its  odd  powers  are  all  roots  of  the  polynomial  xn  +  1, 
i.e.  the  roots  of  the  polynomial  xn  +  1  are  odd  powers  of  the 
primitive  2n-th  root  of  unity.  Therefore  we  can  characterize 
the  code  C  by  its  defining  set,  which  is  the  collection  of  all  j 
such  that  a3  is  a  zero  of  g(x). 

The  covering  radius  R(C)  of  the  code  C  is  defined  as  the 
smallest  integer  R,  such  that  the  spheres  of  radius  R  around 
the  codewords  cover  the  n-dimensional  vector  space  Fq  over 
GF(q). 

The  function  tq[n,  A:]  is  defined  as  the  least  value  of  R  when 
C  runs  over  the  class  of  all  linear  [n,  k]  codes  over  GF(q)  for 
a  given  q. 

II.  Ternary  Negacyclic  Codes  with  length  up 
to  26 

Table  II-A  from  [4]  was  used  as  source  for  all  ternary  nega¬ 
cyclic  codes  of  lengths  up  to  26.  According  to  [3]  two  codes 

1  This  work  was  supported  by  a  DFG  Contract. 


are  equivalent,  if  their  defining  sets  are  the  same  up  to  mul¬ 
tiplication  with  an  integer  coprime  to  their  length.  So,  only 
nonequivalent  codes  were  considered.  Their  minimal  distances 
and  weight  spectra  were  recalculated.  To  determine  R(C), 
part  of  the  codes  were  handled  analytically  and  for  the  rest  of 
them  computer  calculations  by  Method  1  and  Method  2  as  in 

[1]  were  used. 

Bounds  for  the  function  <3  [zz ,  fc]  for  ternary  codes  with 
n  <  27  are  given  in  [2,  Table  II].  Based  on  the  determined  in 
this  work  covering  radii  of  ternary  negacyclic  codes  we  have 
obtained  some  exact  values  and  upper  bounds  for  <3  [n,  A:]. 
Proposition 

1)  <3  [20,  6]  =  7-8. 

2)  t3[20, 10]  =  <3 [20 , 11]  =  f3[21, 11]  =  4. 

3)  <3 [24,  12]  =  f3 [25, 13]  =  5. 

4)  <3 [21 , 10]  =  <3[22, 1 1]  =  5 

5)  <3[22,  10]  =  5  —  6. 

6)  <3 [25,  12]  =  5  -  6. 
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I.  Introduction  and  Definitions 

The  concept  of  multicovering  radius  was  introduced  by  Klap¬ 
per  [2]  in  the  context  of  studying  the  security  of  stream  ci¬ 
phers.  Let  C  be  a  code  of  length  n  and  m  be  a  positive 
integer.  The  m-covering  radius  of  C  is  the  smallest  integer  r 
such  that  every  set  of  m  vectors  in  Fn  is  contained  in  at  least 
one  ball  of  radius  r  around  a  codeword  in  C. 

We  denote  the  m-covering  radius  of  a  code  C  by  Rm(C). 
Then  R{C )  :=  Ri(C)  is  the  covering  radius  of  C.  For  results 
on  the  covering  radius,  see  to  the  book  by  Cohen,  et.  al.  [1], 

In  general  we  are  interested  in  various  extremal  values  asso¬ 
ciated  with  this  notion:  Rm( Fn)  :=  the  smallest  m-covering 
radius  among  length  n  codes,  tm(n,K)  the  smallest  m- 
covering  radius  among  (n,  K)  codes. 

The  purpose  of  this  paper  is  to  derive  new  bounds  by  relat¬ 
ing  the  multi-covering  radii  of  a  code  to  a  relativized  notion 
of  covering  radius.  For  generality,  we  define  this  notion  for 
multi-covering  radii,  although  we  shall  only  use  the  ordinary 
covering  radius  version. 

Definition  1  Let  C  and  S  be  codes  of  length  n,  and  let  m 
be  a  positive  integer.  Then  the  k- covering  radius  of  C  relative 
to  S,  Rk(S,C),  is  the  smallest  integer  r  such  that  for  every 
c1,  •  •  • ,  ck  €  C  there  is  an  s  €  S  such  that  d(cl,  s)  <  r  for  all 
i  =  1,  •  •  •  ,k.  Also,  tk(m,  C)  :=  min{i?fc(S,  C)  :  |S|  =  m}. 

Note  that  ffc(m,  Fn)  =  tk{n,m). 

II.  A  Fundamental  Identity 

Let  S  denote  the  set  of  word-complements  of  elements  of  a 
code  S. 

Theorem  2  If  C  is  a  code  of  length  n  then 
Rm(C)  =  n  -  ti(m,C). 

For  C  =  Fn  we  obtain  the  following  corollary,  which  is 
essentially  a  restatement  of  Theorem  19.4.4  of  Cohen,  et.  al.  [1] 
(cf.  also  Theorem  19.4.2). 

Corollary  3  For  all  natural  numbers  n,m.  >  1,  iZm(F")  = 
n  —  ti(n,m.). 

III.  The  3-covering  radius  of  Hamming  codes 

Let  Fir  denote  the  Hamming  code  of  order  r.  It  was  shown  by 
Klapper  [2]  that  for  any  m  >  2  and  r  >  2,  2r_1  <  Rm(Fir)  < 
2r_1  +  cm,  where  cm  is  a  constant  depending  only  on  m.  It 
was  also  shown  that  FlmiFif)  =  3  for  m.  >  2;  for  r  >  3  we 
have  RiifHr)  =  2r_1;  and  for  m  =  3,4,5  we  have  2r_1  < 
Rm  {Fir)  <  2’  1  -1-1.  However,  in  this  .last  case  the  precise  value 
was  unknown.  In  this  section,  using  Theorem  2,  we  determine 
exactly  the  3-covering  radius  of  the  Hamming  codes. 
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Theorem  4  We  have  ti(3,Fir)  =  \{n  —  1)  =  2r_1  —  1  and 
R3(Fir)  =  |(n  +  1)  =  2r_1. 

IV.  Corollaries 

It  is  known  from  Klapper  [2]  that  for  all  n  >  3  /?2(Fn)  = 
R3 (Fn )  =  fn/21  and  Fn)  =  R5{ Fn)  =  f(n  +  l)/21 .  Using 
Corollary  2.2  and  known  results  about  K(n,  R),  the  minimum 
cardinality  of  a  binary  code  of  length  n  and  covering  radius 
R,  we  can  determine  i?e(Fn)  and  f?7(Fn). 

Theorem  5  For  all  n  >  4  we  have  ( Fn)  =  f (rz  -f- 1)/2"] 
andR7(Fn)=\(n  +  2)/2}. 

Since  t\(n,2k)  is  bounded  above  by  the  covering  radius  of 
any  binary  linear  code  of  length  n  and  dimension  k,  we  also 
obtain  many  bounds  on  multicovering  radius  by  using  known 
bounds  on  binary  linear  codes.  The  resulting  tables  of  bounds 
are  omitted  from  this  abstract. 

Using  Corollary  2.2  and  the  results  in  Section  12.5  of  Co¬ 
hen,  et.  al.  [1]  we  obtain  asymptotic  results  on  7?m(F").  For 
instance,  using  Theorems  12.5.1  (sphere-covering  bound)  and 
12.5.10.  (from  Lovasz,  Spencer  and  Vesztergombi  [3])  we  ob¬ 
tain  the  following  theorem. 

Theorem  6  For  all  n  and  rr?,,  Rm{ Fn)  <  n/2  + 

yj n log2  min 2/2.  For  all  n  and  m,  Rm( Fn)  <  n/2  +  12-y/m. 

We  obtain  bounds  on  t\ (m,  C)  by  counting  vectors  in  balls. 
If  |S|  =  m,  Ri(S,C)  =  ti(m,C),  and  k  is  the  maximum 
number  of  vectors  of  C  in  any  ball  of  radius  t\{m.,C),  then 
km  >  \C\.  Thus  ti(m,C)  must  be  large  enough  that  k  > 
|C|/m.  When  m  <  |C|,  we  must  have  k  >  2.  If  do  is  the 
largest  minimum  distance  among  the  (m+l)-element  subcodes 
of  C ,  then  any  ball  of  radius  at  most  (do  —  l)/2  contains  at 
most  one  element  of  C.  Thus  if  ti(m,  C)  <  (do  —  l)/2,  then 
k  —  1,  which  is  false.  Therefore  ti(m,  C)  >  (do  —  l)/2.  That 
is,  ti{m,,C)  >  do/2.  Hence  Rm(C)  <n  —  do/2.  In  particular, 

Theorem  7  If  the  minimum  distance  of  C  is  d,  then 
Rm(C)  <  n  -  d/2. 
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Abstract  —  A  simple  decoding  algorithm  for  the 
[24,12,8]  extended  Golay  code,  based  on  the  j  a  +  x  | 
6+x|a+6+x|  Turyn  construction  is  described. 

I.  Introduction 

In  this  paper  we  propose  a  simple  high-speed  decoding  al¬ 
gorithm  for  the  [24,12,8]  extended  Golay  code,  based  on  the 
|  a+x  |  6+x  |  a+b+x  [  Turyn  construction  [1],  The  algorithm 
can  be  easily  realized  in  combinational  circuits.  Futhermore 
we  show  that  [24,12,8]  Golay  code  can  correct  simultaneously 
all  patterns  of  three  or  fewer  random  errors  as  well  as  certain 
patterns  of  quadruple  errors  such  as  4-bit  cyclic  single-burst 
and  two-dimensional  byte  errors. 

II.  The  Decoding  Algorithm  for  Random 
Errors 

Let  Ci  be  the  binary  cyclic  [7,4,3]  Hamming  code  with 
the  generator  polynomial  gi(x)  =  x3  +  x  +  1  over  GF( 2) 
and  let  C2  be  formed  by  reversing  the  code  words  of  Ci.  Let 
V\  and  V2  be  the  [8,4,4]  codes  obtained  by  adding  an  overall 
parity  check  to  Ci  and  C2.  The  code  C,  consisting  of  all  vectors 
c  =  |  a+x  |  b+x  |  a+b+x  |,  where  a,b  e  Vi  and  x  e  V2,  is  the 
[24,12,8]  extended  Golay  code  G24  (Turyn  [l],pp.  587-588). 

Let  z  —  c  +  e  be  a  received  word,  c  be  a  code  word  of 
the  Golay  code  and  e  be  an  error  word.  Represent  the  words 
z  and  e  as  z  =\  z0  \  zi  \  z2  \  and  e  =|  eo  |  ei  |  e2  |,  where 
zo  =  a  +  x  +  eo,  z\  —  b  +  x  +  ei,  z2  =  a  +  b  +  x  +  e2.  Under 
decoding  we  define  moi  =  zo  +  zi  =  a  +  b  +  e0  +  ei,  u02  = 
zo  +  z2  =  b  +  eo  +  e2,  U12  =  z\  +  z2  =  a  +  ei  +  e2,  U012  = 
Zo  +  zi  +  z2  =  x  +  eo  +  ei  +  e2.  Calculate  the  syndromes  of 
these  words  and  overall  parity  checks  A i  of  the  words  z,- ,  i  = 
0,1,2.  For  an  error  word  e  of  the  weight  wt(e)  <  8  the 
syndromes  S{u02)  =  S(iii2)  =  S(w0i2)  =  0  if  and  only 
if  e  =  0.  S(z)  —  (S(u02),S{u12),S(u0l2))  is  called  the 
syndrome  of  z.  Let  z(1)  =  c(1)  +  e(1)  and  z(2)  =  c(2)  +  e{2) 
where  and  are  words  of  the  Golay  code  C  and 

e(1),  e(2)  are  error  words.  If  e(1)  ^  e(2)  and  mf(e(1))  < 
3,  wf(e^)  <  4,  then  S(z^')  ^  S(z^).  Use  the  syndrome 
S(z)  and  overall  parity  checks  A;  in  order  to  find  the  syndrome 
Sn{zn)  of  the  word  zn  of  the  intersection  code  Vi  fl  V2  that 
is  the  extended  BCH  code  of  the  minimum  distance  8.  Then 
using  the  decoding  algorithm  for  this  code  we  define  locators 
aj,  otj  €  GF(  23),  j  —  0,1,2,...,  7  of  errors  in  the  words 
zo,  zi  and  Z2. 

The  described  algorithm  for  random  errors  is  suitable  for 
implementation  in  combinational  curcuits.  We  estimate  the 
number  of  gates  of  the  decoder  and  the  decoding  delay.  We 
show  that  the  combinational  decoder  of  the  [24,12,8]  extended 


Golay  code  needs  no  more  than  295  XOR,  234  AND,  101  OR 
and  63  NOT  gates  and  no  more  than  24  gate  delays  for  cal¬ 
culating  the  output  word. 

III.  Correction  of  single-burst  and 

TWO-DIMENSIONAL  BYTE  ERRORS 
We  will  represent  the  Golay  code  as  a  generalized  array 
code  [2],  Then  the  code  word  c=|a  +  x|6  +  x|a  +  fc  +  x| 
of  the  Golay  code  C  is  represented  as  following  array 


c  00 

Coi 

CO  2 

c  03 

C04 

C05 

C06 

0)7 

C10 

C11 

C12 

C]  3 

C14 

C]  5 

Cl6 

C17 

C20 

C21 

C22 

C23 

C24 

C25 

C26 

C27 

The  error  word  array  e  is  called  a  4-bit  cyclic  single¬ 
burst  error  if  for  some  i,j  (where  i  and  j  are  the  row 
and  column  indexes  of  the  array,  respectively)  the  symbols 
el,j,e1+ij+l,el+2j+2,ei+3j+3  may  be  equal  to  1  and  all 
other  symbols  of  the  error  word  e  are  equal  to  0.  The  in¬ 
dexes  i  and  j  are  taken  mod  3  and  mod  8  respectively.  If 
Ci.j  =  ej+ij+i  =  ei+2,j+2  =  e;+3,j+3  =  1,  the  error  word  e 
is  called  a  4-bit  cyclic  solid  single-burst  error.  As  the  Golay 
code  corrects  all  patterns  of  three  or  fewer  errors  we  consider 
only  4-bit  cyclic  solid  single-burst  errors.  We  show  that  all 
4-bit  cyclic  solid  single-burst  errors  have  different  syndromes, 
that  differ  also  from  the  syndromes  of  all  error  words  of  weight 
<3. 

The  error  word  e  is  called  a  4-bit  two-dimensional  single¬ 
byte  error  if  for  some  i  =  0,1,2  and  j  =  0,  2,  4,  6  the  symbols 
d,j ,  e,,j+i ,  e,'+i,j+i  may  be  equal  to  I  and  all  other 

symbols  of  the  error  word  e  are  equal  to  0.  The  index  i  is 
taken  by  mod  3.  If  aj  =  e;,J+ 1  =  e,+ij  =  e,+iJ+i  =  1,  the 
error  word  e  is  called  a  4-bit  two-dimensional  solid  single-byte 
error.  We  show  that  all  4-bit  two-dimensional  solid  single¬ 
byte  errors  have  different  syndromes,  that  also  differ  from  the 
syndromes  of  all  words  of  weight  <  3  and  4-bit  cyclic  solid 
single-burst  errors. 
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Abstract  —  A  framework  is  presented  for  general¬ 
ized  minimum  distance  (GMD)  decoding  with  a  lim¬ 
ited  number  of  decoding  trials  and  fixed  erasing.  The 
realizable  distance  for  this  technique  is  studied. 

Generalized  Minimum  Distance  (GMD)  decoding,  as  intro¬ 
duced  by  Forney  [1],  permits  flexible  use  of  reliability  informa¬ 
tion  in  algebraic  decoding  algorithms  for  error  correction.  In 
subsequent  trials,  an  increasing  number  of  the  most  unreliable 
symbols  in  the  received  sequence  is  erased,  and  the  resulting 
sequence  is  fed  into  an  algebraic  error-erasure  decoder,  until 
the  decoding  result  and  the  received  sequence  satisfy  a  cer¬ 
tain  distance  criterion.  In  Forney’s  original  algorithm,  the 
unique  codeword  (if  one  exists)  satisfying  the  distance  crite¬ 
rion  is  found  in  at  most  [d/2]  trials,  where  d  is  the  Hamming 
distance  of  the  code. 

Kovalev  [2]  considered  GMD  decoding  algorithms  in  which 
the  maximum  number  of  trials  is  limited  to  a  certain  number 
/  >  1.  This  restriction  may  decrease  the  error  correction  capa¬ 
bilities  compared  to  Forney’s  algorithm.  Still,  it  is  worthwhile 
investigating  limited-trial  GMD  decoding,  since  its  reduction 
of  the  delay  (in  case  of  a  serial  implementation)  or  the  number 
of  required  error-erasure  decoders  (in  case  of  a  parallel  imple¬ 
mentation)  may  more  than  compensate  for  the  (slightly)  worse 
performance. 

We  assume  the  following  situation.  A  codeword  c  = 
(ci, . . . ,  cn)  from  a  5-ary  code  C  of  length  n  and  Hamming  dis¬ 
tance  d  is  transmitted  over  a  5-ary  channel.  The  output  of  the 
channel  consists  of  the  received  5-ary  vector  r  =  (n, . . .  ,rn) 
and  an  associated  reliability  vector  a  =  (ai,...,a„),  where 
all  an  are  from  a  set  72  which  is  a  subset  of  the  real  interval 
[0,1]  containing  {1},  i.e.,  {1}  C  72-  C  [0,1].  The  higher  a<, 
the  more  reliable  is  the  symbol  n.  The  generalized  distance 
between  the  received  word  r  with  reliability  vector  a  and  a 
5-ary  vector  z  =  (zi,  ...,zn)  is  defined  as 

da(z,  r,a)  =  (1  —  cm)/2  +  ^2  (1  +  aj)/2.  (1) 

i:zi=Ti  i:zi^T{ 

In  the  GMD  decoder,  some  of  the  most  unreliable  received 
symbols,  i.e.,  n  with  lowest  a< ,  are  erased.  In  this  work,  where 
we  consider  fixed  erasing,  the  erasing  procedure  is  based  on  a 
fixed  set  I  =  {ii , . . . ,  ii},  which  is  independent  of  the  received 
reliability  vector  a.  In  trial  j,  the  ij  most  unreliable  received 
symbols  are  erased,  after  which  the  resulting  vector  r j  is  fed 
into  an  error-erasure  decoding  algorithm  for  the  code  C  with 
the  property  that  it  returns  the  original  codeword  whenever 
the  numbers  of  erasures  s  and  of  errors  t  are  such  that  2 t  +  s  < 
d.  This  leads  to  (at  most)  l  codewords  c j,  among  which  one 
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with  smallest  dc  (cj ,  r,  a),  is  chosen  as  the  final  decoding  result 

c. 

For  a  code  of  Hamming  distance  d  and  length  n,  a  reliabil¬ 
ity  vector  a  of  length  n,  and  an  erasing  set  X,  let  dT(d,a,  X) 
be  defined  as  the  largest  real  number  dr  for  which  the  follow¬ 
ing  assertion  holds:  for  any  transmitted  codeword  c  and  any 
received  vector  r  of  length  n  with  reliability  vector  a  such 
that  d G(c,r,a)  <  dr/2,  the  original  codeword  c  is  delivered 
by  the  GMD  decoder  based  on  erasing  set  X.  For  a  code  of 
Hamming  distance  d  and  length  n,  a  reliability  set  7 2,  and  a 
fixed  erasing  set  X ,  let  the  realizable  distance  of  the  associated 
GMD  decoder  be  defined  as  the  inftmum  of  dr(d,  a,  X)  over  all 
a  E  72”. 

The  realizable  distance  is  an  important  figure  of  merit  for  a 
limited-trial  GMD  decoder.  For  any  72.,  Forney  [1]  has  shown 
that  by  erasing  2j  —  1  -  d  +  2[d/2]  most  unreliable  symbols 
in  the  jth  trial  ( j  =  1, . . . ,  [d/2]),  the  realizable  distance  is 
(at  least)  d,  i.e.,  the  full  Hamming  distance  d  is  exploited. 
For  72.  =  [0, 1]  and  fixed  erasing,  Kovalev  [2]  calculated  the 
loss  of  distance  compared  to  Forney’s  algorithm  in  case  the 
maximum  number  of  decoding  trials  is  restricted  (l  <  [d/2]). 
Here,  we  extend  Kovalev’s  result  to  the  case  that  72  is  any 
subset  of  [0, 1]. 

For  a  code  C  of  Hamming  distance  d  and  length  n,  a  reliabil¬ 
ity  set  72  with  infimum  p,  and  a  fixed  erasing  set  X  satisfying 
— u  =  io  <  0  <  ii  <•••<*/<  ij+i  =  d  +  1  and  d  —  ij  =  1(2) 
for  all  j ,  the  realizable  distance  can  be  shown  to  equal 

d+1  -  (til  +  max  (ik+i  -  ik)-  (2) 

l  k=0,...,l 

Note  that  the  realizable  distance  depends  on  the  code  C  only 
by  its  Hamming  distance  d  and  on  the  reliability  set  72  only 
by  its  inftmum  p.  For  given  C,  72,  and  l  >  1,  the  maximum 
realizable  distance  among  all  erasing  sets  X  of  size  l  can  be 
shown  to  equal 

s(*n££l)  if  0  <  <  ji+r 

and  r££]  <  irfl/ij, 

<  S^LM/^-l)  if  21+1  <  P  <  2[i]-2JLrfl/iJ+i  (3) 

and  r|£l  <  Lrfl/iJ, 

„  0(rfl)  otherwise, 

where  g(x )  =  2px  +  (1  -  p)(d  +  1  -  fm/i] ). 
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Abstract  —  For  binary  linear  block  codes,  we  in¬ 
troduce  “multiple  GMD  decoding  algorithm”,  where 
GMD-like  decoding  is  iterated  around  a  few  appro¬ 
priately  selected  search  centers.  Compared  with  the 
original  GMD  decoding  by  Forney  [1],  this  decoding 
algorithm  provides  better  error  performance  with  in¬ 
creasing  the  number  of  iterations  of  erasure  and  error 
correction  moderately.  To  reduce  the  number  of  it¬ 
erations,  we  derive  new  effective  sufficient  conditions 
on  the  optimality  of  decoded  codewords. 

1.  Definitions:  Suppose  a  binary  linear  ( N,K )  block 
code  C  with  minimum  weight  dmin  is  used  over  the  AWGN 
channel  using  BPSK  signaling.  Each  codeword  is  transmit¬ 
ted  with  the  same  probability.  For  a  positive  integer  n,  let 
V"  denote  the  set  of  all  binary  n-tuples.  For  a  received 
sequence  r  =  (n,  r?, . . .  ,  rw),  the  hard-decision  sequence 
z  =  ( 21,22 )•••  ,zn )  of  r  and  an  TV-tuple  u  G  UN,  define 
L(u)  =  Ei6D_i(«)  M .where  D-i(u)  =  {t  :  w  yt  zu  and  1  < 

i  <  TV}.  For  nonempty  U  C  VN ,  let  L[U]  be  defined 
as  minugc/  L(u).  Let  v[U],  called  the  best  in  U,  denote 
an  TV-tuple  in  U  such  that  L(v[U])  =  L[U],  For  integers 
1  <  *  <  j  <  TV,  u  and  v  G  VN ,  let  dH,i,j{u,v)  denote  the 
Hamming  distance  between  u  and  v  from  the  i-th  bit  to  the 
j-th  bit. 

2.  Multiple  GMD  decoding:  For  v  G  V^,  a  GMD-like 
decoding  with  search  center  v,  denoted  GMD(v),  is  defined 

as  the  iterative  decoding  algorithm  consisting  of  p  =  |_(dmin  + 
1)/2J  stages  whose  j-th  stage  is  an  algebraic  decoding  which 
corrects  2j  —  p  —  1  erasures  in  the  least  reliable  2j  -  p  —  1 
components  and  p  —  j  or  less  errors  in  the  remaining  compo¬ 
nents  of  input  v  for  1  <  j  <  p.  The  region  which  has  not 
been  searched  (for  candidate  codewords)  yet  up  to  the  j-th 
stage  of  GMD(v),  denoted  Rp(v,j),  is  given  by  Rp(v,j)  = 
{x  G  VN  :  dH'2j'-p,N(x,v)  >  p-  j'  for  1  <  j'  <  j).  Define 
■Rgmd(u)  =  Rp(v,  p). 

For  a  positive  integer  ft,  ft-GMD  decoding  is  defined  as 
an  iterative  decoding  algorithm  which  consists  of  successive 
GMD(«(1)),  GMD(«<2>),  ...,  GMD(v(fe>).  Here,  «<‘>  G  VN 
is  called  the  i-th  search  center  of  the  ft-GMD  decoding,  and 
=  z  and  other  v ^  is  chosen  as  the  best  TV-tuple  in  the 
region  which  has  not  been  searched  by  (i  —  1)-GMD  decoding, 

that  is,  v(i)  =  v  [fli-T1!  fioMDivii1))]-  For  1  <  i  <  ft  and 
1  ^  j  <  Pi  the  j-th  stage  of  the  i-th  GMD(t/’^)  decoding  is 
called,  the  (i,j)-th  stage.  Let  c^st(j)  be  the  best  candidate 
codeword  generated  up  to  the  (i,  j)-th  stage,  if  it  exists.  After 
the  (ft,  p)-th  stage,  c^t(p)  is  output  as  the  decoded  codeword, 
unless  the  decoding  fails. 

3.  New  Early  Termination  Conditions:  Just  after  the 
(i,  j)-th  stage,  R(i,j)  =  (fl’^j  -RGMD(vO')))n-Rp(v(i),  j)  is  the 
region  which  has  not  yet  been  searched  for  candidate  code¬ 
words.  For  v  G  VN,  Odmin(v)  =  {x  G  VN  :  dH,i,N(x,v)  > 
dmin}-  The  following  condition  is  a  sufficient  condition  on  the 
optimality  of  4^0): 

Cond i?>(j)  :  T(c^st(j))  <  L  n  OdmiMLU))]  ■ 


Cond^(j)  is  stronger  than  CondTP  introduced  by  Taipale- 
Pursley  [2],  because  R(i,j)  is  taken  into  account. 

4.  Simulation  Results:  Figure  1  shows  simulation  re¬ 
sults  of  block  error  probabilities  of  the  extended  (128,  85)  BCH 
code,  denoted  EBCH(128,  85).  For  comparison,  the  block  er¬ 
ror  probabilities  for  bounded  distance-<o(  =  [(dmin  — 1)/2J)  de¬ 
coding  and  Chase  decoding  algorithm  II  [3]  are  shown. 

The  reduction  rate  of  3-GMD  decoding  is  defined  as  the 
ratio  of  the  number  of  iterations  of  erasure  and  error  correc¬ 
tion  stage  to  the  maximum  3 p.  Since  sufficient  conditions  can 
be  used  only  when  at  least  one  candidate  codeword  has  been 
generated,  rates  ptp  and  /xnew  denote  the  averages  of  reduc¬ 
tion  rates  by  using  CondTP  and  Condi) *  (j),  respectively,  as 
early  termination  conditions  over  all  the  trials  where  at  least 
one  candidate  codeword  is  generated.  Table  1  lists  /xtp  and 
Pnew  in  percentage  for  EBCH(64,  24)  and  EBCH(128,  85)  at 
Eb/No  =  2.0dB  and  4.0dB. 


Figure  1:  Block  error  probability  of  EBCH(128,  85) 
Table  1:  The  average  of  reduction  rate  of  3-GMD  decoding 


Code 

Eb/No  =  2.0dB 

Eb/N0  =  4.0dB 

Ptp 

Mnew 

Ptp 

NEW 

EBCH(64,  24) 

66.7% 

61.6% 

23.6% 

19.8% 

EBCH(128,  85) 

78.4% 

74.3% 

21.1% 

18.1% 
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Abstract  —  It  is  shown  that  there  exist  totally  self¬ 
checking  ( TSC )  combinational  decoders  for  (n,  k )  Ham¬ 
ming  SEC  codes  if  and  only  if  n  =  2r  —  1,  r  =  n  —  k. 
For  shortened  (n,  k)  Hamming  SEC  codes  the  modi¬ 
fied  combinational  totally  self-checking  decoders  with 
minimum  testing  delay  2  <  p  <  (n  —  k)  are  suggested. 

I.  Introduction 

The  most  natural  architecture  for  on-chip  error-correcting 
is  to  place  a  single-error-correcting  code  along  each  row  of  the 
memory  [1],  To  maintain  the  high  level  of  reliability,  faults 
of  the  decoder  as  well  as  faults  of  the  memory  chips  must  be 
detected.  The  effective  means  of  of  the  achievment  of  these 
purposes  are  self-checking  circuits.  In  this  paper  the  prob¬ 
lem  of  constructing  totally  self-checking  decoders  for  Ham¬ 
ming  SEC  codes  is  considered.  It  is  shown  that  there  exist 
totally  self-checking  combinational  decoders  for  (n,k)  Ham¬ 
ming  SEC  codes  if  and  only  if  n  =  2r  —  1,  r  =  n  —  k.  For 
shortened  ( n ,  k)  Hamming  SEC  codes  the  modified  combina¬ 
tional  totally  self-checking  decoders  with  minimum  testing  de¬ 
lay  2  <  p  <  (n  —  k)  are  suggested. 

II.  Totally  Self-Checking  Circuits 

We  consider  a  combinational  circuit  that  produces  an  out¬ 
put  vector  y(x,f),  which  is  a  function  of  the  intput  vector 
x  from  the  intput  set  X  and  a  fault  /  £  F.  The  standard 
single  stuck-at  fault  model  is  assumed  (  [2,  pp.  249-248]).  A 
circuit  C  is  fault-secure  for  an  input  set  X  and  a  fault  set  F 
if  for  any  input  a;  in  X  and  for  any  fault  /  in  F,  y(x,  f)<£Y 
or  y(x,  f)  =  y(x,  A)  where  A  is  the  null  fault.  A  circuit  C  is 
self-testing  for  an  input  set  X  and  a  fault  set  F  if  for  every 
fault  f  in  F  there  is  some  input  x  in  X  such  that  y(x,  f)  £  Y. 
An  input  x  for  which  y(x,  f)  £  Y  is  called  a  testing  pattern 
for  /.  A  circuit  C  is  totally  self-checking  (TSC)  if  it  is  both 
self-testing  and  fault-secure  ([3,  pp.  392-394]).  Self-testing 
is  a  rather  difficult  condition  to  satisfy  perfectly.  With  this 
difficulty  in  mind,  the  concepts  of  strongly  fault  secure  logic 
networks  [4]  and  totally  self-checking  circuits  with  minimum 
testing  delay  [5]  were  proposed.  A  TSC  circuit  is  TSC  with 
minimum  testing  delay  p  =  1. 

III.  Modified  Self-Checking  Decoders  for 
Hamming  SEC  codes 

The  decoder  of  the  Hamming  SEC  code  comprises  the  syn¬ 
drome  generator  ( SG ),  the  syndrome  decoder  (SD)  and  the 
corrector  (COR).  It  is  supposed  that  the  decoder  is  a  combi¬ 
national  circuit.  SG  and  COR  are  composed  entirely  of  XOR 
gates.  SD  is  constructed  by  AND  gates  and  NOT  gates. 


Let  X  =  {a:  :  x  =  v  +  e,  v  6  V,  wt(e)  <  1}  be  an  input 
set  and  Y  =  {«/:«/  =  y(x,  A)  =  v,  x  E  X,  v  e  V }  be  an 
output  set  of  a  combinational  decoder  of  a  systematic  (n,  k) 
Hamming  SEC  code  V  in  the  absence  of  faults.  We  show 
that  the  decoder  of  V  is  totally,  self-checking  if  and  only  if 
n  =  2r  —  1,  r  =  n  —  k. 

Most  of  the  codes  for  semiconductor  memory  applications 
are  shortened  codes.  For  n  <  2r  —  1  we  construct  totally 
self-checking  decoders  with  minimum  testing  delay  2  <  p  < 
(n  —  k).  We  show  that  for  any  shortened  (n,  k)  Hamming  SEC 
code  there  exists  a  parity-check  matrix  such  that  a  decoder  of 
this  code  is  TSC  with  minimum  testing  delay  p  =  2. 

Totally  self-checking  code  checker  can  be  constructed  for 
the  combinational  totally  self-checking  decoder  with  testing 
delay  p  of  a  systematic  Hamming  SEC  code.  This  checker 
consists  of  the  regenerator  of  syndrome  pairs  and  multi-input 
two-rail  code  checker.  The  description  of  the  syndrome  regen¬ 
erator  and  multi-input  two-rail  code  checker  can  be  found  in 
[3,  pp.  443-444,  459-466].  We  show  that  the  decoder  can  be 
modified  such  that  the  complexity  of  the  checker  is  becoming 
significantly  less. 
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Abstract  —  An  explanation  is  given  for  the  para¬ 
doxical  fact  that,  at  low  signal-to-noise  ratios,  the 
systematic  feedback  encoder  results  in  fewer  decod¬ 
ing  bit  errors  than  does  a  nonsystematic  feedforward 
encoder  for  the  same  code.  The  analysis  identifies  a 
new  code  property,  the  d-distance  weight  density  of 
the  code.  For  a  given  d-distance  weight  density,  the 
decoding  bit  error  probability  depends  on  the  num¬ 
ber  of  taps  in  the  realization  of  the  encoder  inverse. 
Among  all  encoders  for  a  given  convolutional  code, 
the  systematic  one  has  the  simplest  encoder  inverse 
and,  hence,  gives  the  smallest  bit  error  probability. 

I.  Introduction 

It  is  well-known  that  the  free  distance  of  a  convolutional  code 
is  the  principle  determiner  of  the  burst  error  probability  (first- 
event  error  probability)  for  large  signal-to-noise  ratios  when 
maximum-likelihood  decoding  is  used  [1],  Since  the  free  dis¬ 
tance  is  a  code  parameter,  the  burst  error  probability  is  the 
same  whether  the  convolutional  code  was  encoded  by  a  non¬ 
systematic  feedforward  encoder  or  by  a  systematic  feedback 
encoder.  The  decoding  bit  error  probability,  however,  depends 
on  the  encoder  used.  Typically  at  high  signal-to-noise  ratios 
where  most  of  the  decoding  burst  errors  are  made  to  code¬ 
words  at  the  free  distance  from  the  transmitted  codeword,  the 
systematic  feedback  encoder  results  in  more  bit  errors  than  a 
nonsystematic  encoder.  We  now  explain  the  paradoxical  fact, 
often  observed  in  practice,  that,  at  low  signal-to-noise  ratios, 
the  systematic  feedback  encoder  results  in  fewer  bit  errors 
than  does  a  nonsystematic  feedforward  encoder. 

II.  d-DISTANCE  WEIGHT  DENSITY 

Our  analysis  is  based  on  consideration  of  what  we  call  the 
d-distance  weight  density  of  the  code,  pa,  and  define  as  the 
fraction  of  l’s  in  the  “detours”  of  weight  d  in  the  binary  con¬ 
volutional  code.  We  use  this  parameter  in  a  model  of  the 
internal  codeword  structure,  together  with  the  structure  of 
the  encoder  inverse,  to  estimate  the  number  of  information 
bit  errors  that  result  from  each  1  in  a  burst  error  that  forms  a 
codeword  of  weight  d.  The  weights  of  codewords  are  code  pa¬ 
rameters,  not  encoder  parameters,  and  hence  these  estimates 
reveal  which  convolutional  encoders  give  the  best  bit  error 
probability  performance.  At  low  signal-to-noise  ratios,  i.e.,  at 
code  rates  close  to  channel  capacity,  error  bursts  are  typically 
very  long  so  that  the  codewords  with  weights  substantially 
larger  than  the  free  distance  of  the  code  primarily  determine 
the  decoding  bit-error  probability. 

1This  research  was  supported  by  the  Foundation  for  Strategic 
Research  -  Personal  Computing  and  Communication  under  Grant 
PCC-9706-09. 


Consider  all  codewords  of  weight  d  in  a  rate  R  =  b/c  fixed 
binary  convolutional  code.  For  small  d,  the  number  of  code¬ 
words  of  weight  d,  nd ,  is  also  small  and  the  value  of  pd  fluctu¬ 
ates  widely.  For  larger  d,  however,  the  number  of  codewords 
of  weight  d  increases  rapidly  and  the  value  of  pd  stabilizes. 
One  finds  that  pd  tends  towards  an  asymptotic  value  as  d  in¬ 
creases.  This  asymptote  is  slightly  memory  dependent.  For 
small  memory,  m,  the  asymptotic  value  is  larger  than  for  large 
m.  As  m  grows,  however,  pd  quickly  decreases  to  its  asymp¬ 
totic  value,  which  we  denote  by  poo-  The  d-distance  weight 
density,  pd,  also  depends  on  the  code  rate,  the  lower  the  rate, 
the  higher  the  value  of  pd ■  To  determine  the  asymptote  poo, 
we  analyzed  randomly  chosen  rate  R  =  b/c,  time- varying  bi¬ 
nary  convolutional  codes.  We  calculated  the  following  values 
of  poo  for  some  interesting  rates:  poo  =  0.29  for  R  =  1/2, 
Poo  =  0.37  for  R  =  1/3  and  poc  =  0.40  for  R  =  1/4. 

III.  Bit  Error  Probabilities  via  Encoder 
Inverses 

We  now  compare  the  decoding  bit  error  probability  for  system¬ 
atic  and  nonsystematic  encoders.  For  a  particular  encoder,  let 
qd  denote  the  arithmetic  average  of  the  number  of  decoding 
bit  errors  per  codeword  1  taken  over  all  codewords  of  weight 
d.  Somewhat  surprisingly,  qd  turns  out  to  be  an  affine  func¬ 
tion  of  d  with  a  slope  that  depends  on  the  encoder  type.  The 
different  slopes  can  be  explained  using  an  argument  involving 
encoder  inverses. 

We  model  the  appearance  of  l’s  within  a  codeword  of 
weight  d  by  a  binary  memoryless  source  which  outputs  a  1 
with  probability  pd-  For  brevity,  we  consider  here  only  rate 
R  =  1/2  codes.  For  systematic  encoders,  whose  encoder  in¬ 
verse  has  only  one  tap,  the  average  number  of  bit  errors  per 
codeword  1  is  then  qd  =  1/2.  This  is  reasonable  since  one 
would  expect  that  half  of  all  codeword  l’s  occur  in  the  sys¬ 
tematic  bit-stream  and  thus  create  information  bit  errors.  For 
quick-look-in  encoders  whose  encoder  inverses  have  two  taps 
[2],  we  obtain  qd  —  1  —pd-  Inserting  the  asymptote  poo  =  0.293 
for  R  =  1/2,  we  get  qd  =  0.71.  As  the  number  of  taps  in 
the  encoder  inverse  increases,  qd  increases  monotonically  to 
its  asymptotic  value  of  0.85.  This  explains  why  the  average 
number  of  bit  errors  per  codeword  at  distance  d  is  larger  for 
nonsystematic  encoders  than  for  the  systematic  ones. 
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I.  Introduction 

Although  Viterbi  decoding  is  widely  used  in  practical  sys¬ 
tems,  only  a  few  studies  have  been  made  for  exact  analysis  of 
error  probability.  In  [2],  we  already  proposed  analytical  tech¬ 
nique  for  exact  analysis  of  error  state  probability  of  2-state  soft 
decision  Viterbi  decoding.  The  extension  to  bit  error  probabil¬ 
ity  analysis  is  possible  with  error  state  probability [3],  In  this 
paper,  this  technique  is  extended  to  exact  error  state  proba¬ 
bility  in  4-state  soft  decision  Viterbi  decoding.  This  employs 
an  iterative  calculation  of  probability  density  function  of  path 
metric. 

II.  Model  for  Analysis 

Let  us  consider  that  the  model  for  analysis  is  consisted 
of  4-state  convolutional  encoder  of  rate  1/2  with  generator 
polynomial  matrix  given  by  [\-\-D-\-D2, 1+U2]  and  signal  set 
mapper  of  binary  phase  shift  keying(BPSK).  In  this  case,  we 
assume  that  an  output  vector  of  the  encoder  (61,62)  £  {0, 1) 
is  transmitted  as  a.  BPSK  modulated  vector  u  =  (ui,v2)  £ 
{+1,-1},  where  m  =  1  —  26*,  i  —  1, 2 . 

III.  Pdf  of  Path  Metric  and  Error  State 
Probability 

For  soft  decision  decoding,  we  assume  the  memoryless  ad¬ 
ditive  white  Gaussian  noise(AWGN)  channel  with  zero  mean 
and  variance  erg.  The  received  vector  v  =  (i?i ,  112)  can  be  ex¬ 
pressed  by  Gaussian  random  variables.  The  conditional  pdf 
of  v  assuming  u  is  given  by 


f(v\u)  =  exp  -  — 


UlY  +  (V2  -  tt2 )2 


In  order  to  renew  the  metrics  of  correct  and  error  states 
by  subtracting  the  metric  of  correct  state  per  a  transition,  we 
define  difference  path  metrics  zi  =  a\  —  ag,  z2  —  <*2  —  ao,  and 
z3  =  (*3  —  «o,  respectively.  The  conditional  pdf  of  z i,  Z2,  and 
z3  is  obtain  by  convolutional  integral  of  (3)  to  eliminate  ao- 

f(zi,z2,z3\ki,k2,k3) 

/OO 

f(ato ,  z\  +  «o,  22  +  «o,  23  +  oo |&i ,  k2,  kz^dcxo (4) 

-OO 

The  pdf  of  zi ,  z2 ,  and  23  after  j  transitions  is  given  by 
/(j)(zi,  22, 23) 

= Jjjf(zi,z2,  z3\k1,k2,k3)f^)(ki,k2,k3)dkidk2dk3.(5) 

The  initial  path  metric  ki,k2,k3  is  constant,  and  its  pdf  is 
represented  by  f^\ki ,  k2,  fc3)  —  S(ki  —  Ki  ,k2  —  K2,k3  —  k3). 
We  begin  to  calculate  the  pdf  f^(zi ,  z2,  z3).  Then,  the  resul¬ 
tant  zi,  Z2,  and  z3  become  the  input  to  the  next  trellis,  which 
it  rewritten  by  ki ,  k2,  and  k3  in  (5),  since  we  calculate  itera¬ 
tively.  Finally,  the  pdf  in  a  stationary  condition  /(zi,Z2,z3) 
is  obtained  by  iterative  calculation  of  (5),  that  is, 

/(zi,22,z3)=  lim  /(j)(zi,z2,z3).  (6) 

J— *-oo 

Error  state  probability  is  defined  as  a  probability  that  metric 
of  one  or  more  error  states  are  larger  than  metric  of  the  correct 
state.  In  Fig.l,  So  is  a  correct  state,  Si,  S2,  and  S3  are  error 
states.  In  this  case,  error  state  probability  is  given  by 


The  branch  metric  M(bi,b2)  is  defined  by  the  logarithm  of 
(1),  and  we  can  express  the  metric  as  [1] 

M{bi,b2)  =  In  f(v\u)  =  A(mvi  +  u2v2)  +  B  ,  (2) 

where  A  and  B  are  independent  of  the  paths,  because  these 
constants  are  the  same  for  the  paths  compared,  and  path  met¬ 
ric  is  additive.  Thus,  we  can  redefine  M{b\,  62)  =  u\V\  +  u2v 2 
as  the  branch  metric. 

Let  us  assume  that  an  input  sequence  of  encoder  is  all  zero, 
that  is,  61  =  62  =  0  and  u\  =  u2  =  1.  The  trellis  dia¬ 
gram  is  given  by  Fig.l  where  symbols  ui  and  u2  are  both 
one,  and  branch  metrics  are  shown  with  x  =  vi  +  v2,  y  = 
—Vi  +  v2.  In  this  figure,  So,  Si,  S2,  and  S3  indicate  the  en¬ 
coder  states.  Here,  So  is  a  correct  state,  Si,  S2  and  S3  are 
error  states,  and  their  path  metrics  are  0,  ki,  k2,  and  k3,  re¬ 
spectively.  After  vi  and  t>2  are  received,  the  path  metrics 
become  c*o  =  max(i,l;2  —  x},  ai  =  max{-i,  +  z},  a2  = 
max{fci  +y,  k3  —  y},  and  013  =  max{ki  —  y,  £3+1/},  respectively. 
The. conditional  pdf  of  four  random  variables  aro,  ni,  ar2,  and 
(*3  is  given  by  a  product  of  f{a 0,  ai\k2),  and  f(a2, 013  |fci ,  k3). 

f(ao,ai,a2,a3\ki,k2,k3)  =  f(a0,  «i  |fc2)/(a2,  «3|fci,  k3)  (3) 


Pr[<S  #  So] 


/0  pO  pO 

-00  J  —  00  J  —  OO 


f(zi,z2,z3)dzidz2dz3.  (7) 


s,  (Zi  =ai- a0) 

52  a2  (z2  =a2-  a0) 

53  y  >oa3(z3=a3-ao) 

Fig.  1:  Redefined  trellis  diagram 
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Abstract  —  The  storage  complexity  of  bounded  dis¬ 
tance  decoding  for  binary-channel  convolutional  codes 
over  the  BSC  is  «  (21_K  -  l)_t,  where  up  to  t  errors  are 
corrected.  We  show  that  the  path  storage  becomes 
«  22Rt  over  the  AWGN  channel,  which  is  significantly 
lower.  Thus  Gaussian  convolutional  coding  is  not  only 
3  dB  more  energy  efficient,  but  its  decoding  is  simpler 
as  well. 

Earlier  it  has  been  demonstrated  [1,  2]  that  breadth-first 
binary  convolutional  decoders  that  correct  up  to  t  errors  over 
a  BSC  must  store  a  number  of  trellis  paths  that  grows  expo¬ 
nentially  with  t  according  to 

M«(21"'R-l)-‘,  (1) 

in  which  M  is  the  number  of  paths  and  R  is  the  code  rate 
in  data  bits  per  channel  use.  We  derive  the  exponential  law 
that  applies  when  the  channel  is  instead  an  additive  white 
Gaussian  noise  (AWGN)  one.  A  more  attractive  law  replaces 
(1),  namely 

22Rt.  (2) 

We  assume  that  the  breadth-first  decoder  is  the  M-algorithm, 
so  that  the  M  in  (1)  becomes  the  M  needed  in  that  algorithm, 
but  the  wider  meaning  of  M  is  that  of  worst-case  storage.  A 
bounded  distance  decoder  (BDD)  is  one  for  which  a  distance 
criterion  A,  or  alternately  an  error  correction  criterion  t,  is 
specified:  The  BDD  must  correct  any  channel  disturbance 
<  A  or  t.  We  assume  further  that  a  certain  stage  far  into  the 
trellis  has  been  reached  and  that  paths  have  been  deleted  if 
they  merge  state  or  exceed  the  BDD  criterion. 

Let  x[l],x[2] .  x[n]  6  {±1},  be  a  word  of  an  ordi¬ 

nary  convolutional  code  of  rate  R  =  (log2  (3)/c,  where  P 
is  the  branching  factor  at  each  trellis  node  and  there  are 
c  binary  symbols  on  each  trellis  branch.  When  two  code¬ 
words  lie  at  Hamming  distance  d2H  from  each  other,  they  lie 
at  Euclidean  distance  D2E  =  4d2H  E,  under  binary  antipo¬ 
dal  transmission.  Normalized  to  the  data  bit  rate,  this  is 
d%  =  4d2HE,/2Eb  —  2 Rd2H.  Here  Eb  is  the  energy  per  data 
bit  and  E,  is  the  energy  per  channel  bit.  A  bounded  distance 
decoder  working  on  these  codewords  guarantees  correction  of 
noises  of  normalized  Euclidean  square  size  S 2  up  to  a  limit 

d%,  free/4  —  RdH,free/2-  (3) 

The  first  event  error  probability  is 

Pe  <  Pr{ |  77 1>  A}  =  Q(Sy/4Eb/N0)  <  (1/2)  e~2pEk/N“ 

For  AWGN  channel  convolutional  BDD  decoding  at  simple 
rates  such  as  1/2  or  1/3,  the  required  survivor  number  M  pre¬ 
cisely  doubles  with  each  increase  by  1  /2  in  the  normed  square 


distance  criterion  S2.  This  phenomenon  has  been  observed  in 
various  forms  during  the  1990s.  The  law  holds  for  <52  up  to 
the  Rd2H jrcJ 2  limit  in  (3).  The  needed  M  in  terms  of  d2H  is 
therefore  M  =  . 

The  general  law  for  a  BDD  at  <5  in  AWGN  is  actually 

M  =  (3l2S*/l°S20i.  (4) 

With  dE  set  to  2t,  the  asymptotic  form  of  this  in  t  is  A/ 

22 Rt.  The  source  of  law  (4)  is  the  fact  that  the  words  of 
a  convolutional  code  occupy  the  vertices  of  a  hypercube  in 
signal  space.  The  worst  case  storage  of  a  breadth-first  BDD 
decoder  at  5  is  the  largest  number  of  signal  points  that  can 
be  enclosed  by  a  ball  in  signal  space  with  radius  5.  Let  a  Ac- 
dimensional  hypercube  be  made  up  from  the  symbols  in  the 
last  K  stages  of  the  code  trellis.  The  normalized  Euclidean 
distance  to  any  cube  vertex  is  y/KcR/ 2;  the  cube  comprises 
PK  code  points.  Thus  a  \JKcR/ 2-ball  encloses  at  least  these 
many.  From  these  facts,  (4)  is  at  least  an  underbound.  For 
any  sensible  code,  it  turns  out  that  the  ball  will  not  enclose 
more  points  than  (4):  For  this  to  happen,  all  codewords  must 
take  the  same  symbol  value  in  one  or  more  positions. 

From  the  BSC  law  (1)  we  can  form  an  approximate  ratio 
for  the  survivors  needed  to  make  full  use  of  a  d2H  over  the  BSC, 
compared  to  full  use  of  the  same  d2H  over  the  AWGN  channel, 
where  full  use  means  that  [(d2H  -  1)/2J  or  fewer  errors  are 
guaranteed  never  to  occur.  It  is 

1 

'2Ry/2'-R  -  1 

Thus  the  AWGN  improvement  in  complexity  grows  exponen¬ 
tially  with  free  distance.  For  example,  at  rate  1/2  a  code  with 
free  distance  12  needs  (1.099) 12  =  3.1  times  more  survivor 
storage  if  works  with  a  BSC  rather  than  an  AWGN  model. 
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Abstract  —  We  clarify  the  relation  between  trellis 
degeneration  (TDG)  and  symbol  reliability  estimation 
using  the  bidirectional  Viterbi  algorithm  (BIVA)  for 
the  case  of  Quick  Look-In  (QLI)  codes. 

I.  Introduction 

Based  on  an  observation  that  the  syndrome  sequence  cor¬ 
responding  to  a  received  sequence  2  contains  many  segments 
with  value  0  (i.e.,  zero-strings)  under  high  SNR  conditions, 
TDG  can  be  realized  for  a  syndrome  trellis.  Here,  “degener¬ 
ation”  means  to  identify  the  error-free  interval  for  each  zero¬ 
string  and  to  exclude  such  intervals  from  the  normal  decoding. 
In  connection  with  this  fact,  we  showed  that  TDG  is  also  pos¬ 
sible  for  a  code  trellis,  if  Scarce  State  Transition  (SST)  Viterbi 
decoding  is  applied  to  QLI  codes.  It  is  shown  that  the  hard- 
decision  input  to  the  main  decoder  in  an  SST  Viterbi  decoder 
is  just  the  syndrome  in  the  case  of  QLI  codes.  This  fact  en¬ 
ables  the  code  trellis  corresponding  to  the  main  decoder  to  be 
degenerated.  On  the  other  hand,  it  is  known  that  symbol  re¬ 
liability  values  are  obtained  by  computing  a  Viterbi  algorithm 
(VA)  in  two  directions  over  a  block  of  coded  symbols.  We  call 
this  scheme  the  Soft-BIVA  (this  is  equivalent  to  the  Max-Log- 
MAP).  Then  we  showed  that  in  the  case  of  QLI  codes,  the 
symbol  reliability  values  are  obtained  by  applying  the  BIVA 
either  to  the  code  trellis  for  the  main  decoder  or  to  the  corre¬ 
sponding  syndrome  trellis  [1],  As  a  result,  a  new  problem  of 
finding  the  relation  between  TDG  and  symbol  reliability  esti¬ 
mation  using  the  Soft-BIVA  has  become  crucial.  In  this  paper, 
we  show  that  the  TDG  process  can  be  effectively  utilized  for 
symbol  reliability  estimation. 

II.  TDG  and  Symbol  Reliability  Estimation  (1) 

In  the  TDG  process  for  a  zero-string  [t,  /'],  forward  de¬ 
coding  is  performed  from  each  state  at  level  t  and  backward 
decoding  is  performed  from  each  state  at  level  t'.  Assume 
that  TDG  is  successful  and  tfhe  sub-interval  [r,  r']  in  which  the 
maximum-likelihood  (ML)  path  surely  passes  through  state  (0) 
(the  all-zero  state )  has  been  identified.  Then  the  ML  bits  (i.e., 
information  bits  on  the  ML  path)  corresponding  to  [r,  r']  can 
be  regarded  as  early  detected  information  bits  in  the  sense  of 
Frey  and  Kschischang  [2],  Note  that  if  the  degenerated  sec¬ 
tions  are  cut  out  of  the  original  trellis,  the  remaining  trellis 
sections  are  divided  into  sub-trellises  terminated  with  state 
(0)  at  both  ends.  In  this  case,  the  reliability  values  for  the 
ML  bits  which  are  contained  in  the  part  of  a  sub-trellis  not 
affected  by  the  TDG  process  are  obtained  by  applying  the 
Soft-BIVA  to  the  sub-trellis  under  consideration. 

III.  TDG  and  Symbol  Reliability  Estimation  (2) 


Next,  consider  an  interval  [t,  t1]  affected  by  the  TDG  pro¬ 
cess.  Note  that  the  TDG  process  can  be  viewed  as  trellis  inte¬ 
gration.  Hence,  it  is  reasonable  to  evaluate  the  reliability  value 
not  for  each  ML  bit  but  for  the  integrated  ML  branch.  Assume 
that  TDG  is  successful  and  the  degenerated  sub-interval  [r,  t'] 
has  been  identified.  Let  Xm(state  in  — ►  state  jo)  be  the  inte¬ 
grated  ML  branch  for  [(,  t'].  Then  the  reliability  value  for 
can  be  evaluated  by 

A*  =  ln|>4(2,  Xr+X^+Xj)]-  max  ln[/i(2,  X3  +  Xm'  +  Xl )], 

m^m 

(!) 

where  Xmi  is  any  integrated  branch  other  than  and 

X* ( X* )  and  X£ (Xf)  are  the  best  paths  linked  with  X^(Xmi) 
obtained  by  performing  forward  decoding  and  backward  de¬ 
coding,  respectively.  Since  all  the  integrated  branches  pass 
through  state  (0)  in  [r,  r%  A*  can  be  reduced  to 

min{(o',0  +  7.'0o)  - max(o,  +  7(0),  (70 )0  +  P10)~ max^o'j  +/?>)}, 

*9**0  3*3  0 

(2) 

where  a ,  and  / 3j  denote  the  metrics  of  the  best  paths  for  state 
i  at  level  t  and  state  j  at  level  t'  obtained  using  forward  de¬ 
coding  and  backward  decoding,  respectively,  and  j(0  and  7oj 
denote  the  metrics  for  the  path  segments  i  — >  (0)(a<  r)  and 
(0)(at  r')  — f  j,  respectively.  Note  that  7(0  and  7^  are  ob¬ 
tained  when  the  TDG  process  for  [t,t']  terminates.  Also,  a, 
and  are  obtained  when  the  Soft-BIVA  is  applied  to  each 
sub-trellis  after  TDG.  Hence,  no  extra  computations  are  re¬ 
quired  for  calculating  the  reliability  value  A*  for  IJ,. 

IV.  Relationship  between  Reliability  Values 
for  an  ML  Branch  and  ML  Bits 

Let  X^  be  the  integrated  ML  branch  for  [t,  tf].  Let  k  be  any 
level  between  t  and  t' .  Also,  let  xa  be  the  path  with  uk  ^  u*k 
which  attains  the  reliability  value  A£  for  the  ML  bit  u*k.  Note 
that  the  restriction  of  xa  to  [t,  t']  does  not  necessarily  belong 
to  a  class  of  integrated  branches  for  [t,  t'].  Then  let  st  and 
sti  be  the  states  which  xa  passes  through  at  levels  t  and  t', 
respectively.  Denote  by  Xa  the  best  path  connecting  st  and 
sti.  In  this  case,  if  Xa  X£,  holds,  then  Ak  is  lower-bounded 
by  the  reliability  value  A*  for  X^. 
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Abstract  —  A  good  assignment  of  binary  codewords 
to  cells  is  necessary  for  a  scalar  quantizer  to  be  robust 
to  channel  errors.  We  investigate  a  redundant  assign¬ 
ment  method  that  uses  Snake-in-the-Box  codes,  which 
have  a  desirable  distance-preserving  property. 

I.  Introduction  and  System  Description 

An  index  assignment  (IA)  associates  length  n  binary  code¬ 
words  wi, . . . ,  wm  to  the  cells,  S\,...,Sn  of  a  scalar  quan¬ 
tizer.  A  source  sample  x  is  encoded  into  the  te,  corresponding 
to  the  cell  Si  in  which  x  lies;  w,  is  sent  over  a  binary  sym¬ 
metric  channel  (BSC)  with  error  probability  e;  and  a  decoder 
produces  £'{A’|r}  as  the  reproduction  x  of  x,  where  r  is  the 
BSC  output  word. 

This  paper  considers  redundant  IA’s,  i.e.  n  >  log2  N.  One 
approach  is  to  use  a  set  of  codewords  with  as  large  minimum 
distance  ( dmin )  as  possible,  so  that  some  channel  errors  can  be 
corrected,  and  then  to  assign  codewords  to  cells  to  minimize 
the  effects  of  uncorrectable  error  patterns,  cf.  [1],  Another 
approach,  pursued  here,  is  to  allow  dmln  =  1,  but  use  IA’s 
that  mitigate  rather  than  correct  channel  errors,  i.e.  a  few 
such  errors  should  cause  only  a  small  error  in  x.  Snake-in- 
the-box  (SIB)  codes  (also  called  circuit  codes)  can  be  used  to 
make  IA’s  of  this  type. 

An  ( n,s,N )  SIB  code  has  the  distance  preserving  proper¬ 
ties  that  e?jj(u»,-,  Wi+i)  =  1,  and  djj(wi,  Wj)  <  s  =>  |i  —  j\  = 
dH(wi,Wj),  where  d#  is  Hamming  distance,  and  s  >  2  is  an 
integer  called  the  spread  of  the  code.  This  allows  it  to  mitigate 
L^=d.J  errors.  SIB  codes  were  first  introduced  as  robust  index 
assignments  for  A/D  converters  [2].  Over  the  years,  the  family 
of  known  SIB  codes  has  gradually  enlarged,  cf.  [3],  However, 
they  have  never  been  studied  as  general  purpose  index  assign¬ 
ments  for  noisy  channel  quantizers.  This  paper  presents  the 
initial  results  of  such  a  study. 

II.  Comparing  SIB  Codes  to  Other  Approaches 

Representative  results  are  given  in  Fig.  1,  which  for  an  N  =  32 
level  quantizer,  compares  the  source  SNR  (SSNR)  due  to  an 
(8,3,32)  SIB  index  assignment  to  that  due  to  other  IA’s,  all 
computed  using  conventional  means.  The  source  sample  is 
Gaussian,  and  the  quantizer  is  optimized  for  it,  assuming  no 
channel  errors.  Because  the  various  IA’s  cannot  all  be  chosen 
to  have  the  same  codelengths  n,  to  make  fair  comparisons, 
instead  of  fixing  the  BSC  error  probability  e,  we  fix  an  under¬ 
lying  analog  channel  (AWGN  or  AWGN  plus  Rayleigh  fad¬ 
ing)  and  use  the  e  that  results  from  BPSK  modulation.  The 
channels  are  parameterized  by  their  CSNR,  defined  as  Es/No, 
where  Es  is  the  average  received  energy  per  data  sample,  and 
No/2  is  the  PSD  of  the  white  Gaussian  noise. 

In  addition,  the  figure  shows  the  SSNR  resulting  from  us¬ 
ing  the  [9,5]  shortened  Hamming  code,  which  is  an  interesting 

^his  work  was  supported  by  NSF  grant  NCR-9415754. 


comparison  because  it  corrects  single  errors,  whereas  spread-3 
SIB  codes  mitigate  single  errors.  Also,  shown  is  the  SSNR  due 
to  the  nonredundant  natural  binary  code  (NBC)  index  assign¬ 
ment,  and  that  due  to  the  NBC  index  assignment  followed  by 
a  Hamming  code,  whose  input  length  is  not  constrained  to 
match  the  output  length  of  the  NBC.  Note  that  the  latter  is 
a  tandem  system  rather  than  an  index  assignment.  (For  the 
Rayleigh  channel,  only  the  [3,1]  Hamming  code  results  are 
shown,  since  they  turned  out  to  be  best.) 

One  may  see  from  Fig.  1  that  the  SIB  index  assignment 
is  better  than  all  other  strategies,  except  at  high  CSNR  for 
the  AWGN  channel,  where  the  tandem  Hamming  codes  are  a 
little  better.  In  our  view,  the  SIB  approach  represents  the  bet¬ 
ter  overall  strategy  because  it  is  significantly  better  at  small 
CSNR,  while  only  slightly  worse  at  high  CSNR.  In  particu¬ 
lar,  as  CSNR  decreases,  the  channel  error  mitigation  strategy 
leads  to  a  more  graceful  degradation  of  SSNR  than  does  the 
channel  error  avoidance  strategy. 

III.  Conclusion 

SIB  codes  have  much  potential  to  be  used  as  an  IA  for  joint 
source-channel  coding.  Their  distance-preserving  property 
protects  against  channel  induced  distortion,  in  a  “bend-but- 
don’t-break”  fashion. 


Fig.  1:  Gaussian  Source,  32  level  SQ. 
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I.  Introduction 

Consider  a  system  designed  for  conveying  a  d-dimensional  ran¬ 
dom  source  vector,  X.  A  sample,  x,  from  the  source  is  fed  to 
the  encoder  e,  producing  an  index  i  —  e(x)  E  {0, . . .  ;  N  —  1}, 
where  N  =  2L .  The  L  bits  of  i  are  then  fed  to  a  binary  sym¬ 
metric  channel  (BSC),  resulting  in  the  output  j  producing  a 
codevector  y;-  from  the  decoder  codebook  {yy  -  We  as¬ 
sume  that  the  BSC  corresponds  to  a  Gaussian  channel  with 
noise  variance  a2  and  with  binary  input  in  {±1}. 

At  the  transmitter,  the  index  i  also  chooses  a  codevector  z i 
from  the  encoder  codebook,  {zi}^1,  and  the  residual  vector 
e  =  x  —  Zi  is  then  formed.  This  vector  is  scaled  by  the  con¬ 
stant  a  and  transmitted  over  a  discrete-time  analog-amplitude 
Gaussian  channel.  (The  scaling  constant  a  regulates  the 
transmission  power.)  The  received  vector  u  =  a  ■  e  +  w, 
where  w  is  zero-mean  Gaussian  with  independent  components 
of  variance  a2,  is  multiplied  by  a  re-scaling  constant  /3  and 
then  added  to  the  codevector  y j,  resulting  in  an  estimate  of 
the  transmitted  source  vector  according  to 

x  =  /3u  +  y  j. 

Hence,  the  reproduction  x  is  based  on  information  transmitted 
both  via  a  digital  and  an  analog  channel.  This  is  the  key  prin¬ 
ciple  behind  the  work  of  this  paper.  Related  previous  work 
can  be  found  in,  e.g.,  [1,  2]. 

II.  System  Design  and  Performance 

We  will  now  present  optimality  criteria  for  the  described  HDA 
system,  resulting  in  a  design  algorithm  striving  to  minimize 
the  distortion  D  =  i?||X  —  X||2  under  a  constraint  on  the 
transmitted  power  Pa  per  channel  use  in  the  analog  channel. 
More  precisely,  the  aim  of  the  design  is  to  find  e(x),  (z i},  {y_,  } 
and  /3  such  that  D  is  minimized,  under  the  constraint  that  a 
is  chosen  such  that  Pa  =  1  is  satisfied  at  all  times. 

Optimality  for  a  fixed  encoder.  Assume  that  e(x)  is  known 
and  fixed,  and  define 

JV-1 

x(j)  4  E[X|J  =  j],  fkj  Pr(7  =  i\J  =  3 )  Pr(J  =  w  =  0 

i=0 

and  the  matrices 

Y  =  [y0  •  •  •  yjv-i],  X  =  [x(0)  •  •  •  x(JV-l)],  and  (F)kj  =  fkj- 

Then  the  optimal  encoder  and  decoder  codebooks,  {z;}  and 
{yj},  can  be  jointly  determined,  by  solving  the  equation 

Y.(Ijv-7F)  =  (1-7)X, 

where  Ijv  is  the  N  x  N  unity  matrix  and  7  =  a/3,  and  then 
letting  Zj  =  m y(i)  =  E[yj\I  =  *].  Furthermore,  the  optimal  /3 
can  be  found  (independently)  as  /3  =  a-1/(l  +  a2). 

1The  work  of  M.  Skoglund  was  supported  in  part  by  the  Swedish 
Research  Council  for  Engineering  Sciences. 

2The  work  of  F.  Alajaji  was  supported  in  part  by  the  Natural 
Sciences  and  Engineering  Research  Council  of  Canada 


Optimality  for  a  fixed  decoder  codebook.  Now  assume  that 
{yj}  is  given,  that  {z,}  is  chosen  as  z,  =  m, ,(i),  and  that 
/3  =  a-1/(l  4-  a2),  as  above.  The  optimal  encoder  then  is 
e(x)  =  argmin  {(1  —  7)  •  ||x  -  my(0||2  +p;}, 

where  gi  =  E[||yj||2|I  =  i]  —  ||my (z) ||2 -  Based  on  these 
results,  the  system  can  be  (locally)  optimized  at  an  assumed 
channel  SNR,  1/a2,  using  an  iterative  approach  similar  to  the 
well-known  generalized  Lloyd  algorithm  for  VQ  design. 

Motivated  by  a  broadcast  scenario,  we  illustrate  below  the 
performance  (signal-to-distortion  ratio  versus  SNR)  of  em¬ 
ploying  a  fixed  encoder  and  an  adaptive  decoder  (adapts  to  a 
varying  SNR) ,  denoted  by  FE„  AD  where  *  is  the  design  SNR 
of  the  encoder.  We  also  illustrate  some  benchmark  schemes. 
All  systems  use  a  rate  of  two  channel  uses  per  source  sample. 
The  source  is  Gauss-Markov  with  correlation  0.9. 


-5  0  5  10  15  20 


Dashed  lines  from  above  at  SNR  =  15  dB:  The  Shannon 
bound  (distortion-rate  function  evaluated  at  channel  capac¬ 
ity);  a  purely  analog  system  (transmits  each  source  sample 
twice,  minimum  mean-square  error  receiver);  a  purely  digital 
tandem  system  (source-optimized  VQ  with  d  =  8  and  L  =  8, 
rate-1/2  Turbo  code  with  (n,  k)  =  (2048,  i024)  and  genera¬ 
tors  (37, 21)).  Solid  lines  from  above  at  SNR  =  15  dB:  A  HDA 
system  with  source-optimized  VQ,  and;  HDA-FE,  AD  systems 
with  *  =  10,  5,  0  dB.  All  HDA  systems  use  d  =  8  and  L  —  8. 

We  observe  that  the  HDA  systems  outperform  the  tandem 
system  and  the  analog  system  (at  high  SNRs).  In  particu¬ 
lar  we  note  the  graceful  improvement  of  the  HDA  systems,  as 
opposed  to  the  leveling-off  in  performance  of  the  tandem  sys¬ 
tem.  We  also  observe  that  the  performance  can  be  improved 
at  low  SNRs  using  the  optimization  procedure. 
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Abstract  —  In  [1],  it  was  proved  that  distortion  due 
to  a  binary  symmetric  channel  is  minimized  by  a  lin¬ 
ear  labelling.  In  this  paper,  we  show  how  to  obtain 
an  asymptotically  optimal  linear  labelling  which  also 
minimizes  the  source  distortion  for  Gaussian  sources. 
This  linear  labelling  is  based  on  the  notion  of  com¬ 
ponent  diversity  which  can  be  obtained  by  algebraic 
constructions  derived  from  Number  Theory  [3]. 

I.  Problem  statement 

In  this  work,  we  present,  an  approach  for  Joint  Source- 
Channel  Coding  based  on  the  minimisation  of,  first,  channel 
distortion  and  then,  source  distortion.  This  problem  has  been 
traditionnally  treated  from  the  source  coding  point  of  view  [2], 
First,  the  source  codebook  is  matched  to  the  source  statistics 
in  order  to  minimize  distortion  due  to  the  source,  and  then, 
the  labelling,  i.e.  the  mapping  between  the  source  codebook 
and  the  channel  codebook,  is  optimized  in  order  to  minimize 
distortion  due  to  the  channel.  Our  approach  follows  the  chan¬ 
nel  point  of  view.  We  propose  to  optimize,  first,  the  channel 
distortion  and  then,  the  source  distortion. 

In  [1],  it  has  been  proved  that,  on  binary  symmetric  chan¬ 
nels,  the  channel  distortion  is  minimized  if  the  vector  quan¬ 
tizer  can  be  expressed  as  a  linear  transform  of  a  hypercube. 
We  propose  to  extend  this  approach  by  finding  a  set  of  linear 
transforms  which  minimizes  the  channel  distortion, along  with 
the  distortion  of  Gaussian  sources. 

II.  Linear  labelling  to  minimize  channel 

DISTORTION 

By  constraining  the  labelling  to  be  linear,  we  solved  the 
problem  of  minimisation  of  the  channel  distortion.  Now,  we 
are  concerned  with  the  problem  of  source  distortion  minimisa¬ 
tion.  We  focus  our  investigation  to  the  case  of  a  memoryless 
zero-mean  Gaussian  source  with  variance  a].  With  our  as¬ 
sumptions,  one  can  express  points  of  the  source  codebook  ~Tf 
as  a  linear  function  of  the  points  of  the  hypercube 

with  G  being  a  matrix  representing  the  linear  transform. 
G  =  has  d  rows  and  n  columns. 

III.  Minimisation  of  source  distortion  (Gaussian 
sources) 

By  looking  at  the  expression  of  the  components  of  j/, 

Vi=L)"=i  <  =  1’2 d 

we  can  see  that,  in  order  to  mimic  a  memoryless  Gaussian 
source,  1?  must  have  the  same  distribution.  To  obtain  this 
distribution,  we  need  to  apply  the  central  limit  theorem  to 


the  independent  random  variables  hj.  In  order  to  insure  the 
Gaussianity  of  T? ,  we  need  that  all  components  gijhj  of  the 
summation  be  nonzero,  for  any  vector  h  . 

This  property  can  be  obtained  with  “maximum  component 
diversity”  constellations  [3].  Let  M  be  the  n  x  n  generator 
matrix  of  a  “maximum  component  diversity”  constellation. 
Then  we  can  obtain  the  previous  property  by  taking  G  equal 
to  any  set  of  d  rows  of  M. 

We  can  show  how  to  construct  full  diversity  rotations  of 
dimension  n  =  2m ,  m  being  a  positive  integer.  As  an  example, 
we  construct  a  full  diversity  rotation  matrix  with  m  —  2.  In 
this  case,  we  obtain 


with 


M  = 


1 

y/2 


Ull 


CJ  3 

UJg 

CVi5 

W21 

UJj 

w  21 

Cl>3 

Wl7 

con 

Wl 

W23 

<Vl3 

^15 

Wl3 

Wll 

cog 

cos  | 

f  7T 

A 

<2(m+2)  ') 

Assume  that  we  need  a  quantizer  of  dimension,  let  us  say, 
2.  Take,  as  an  example  the  first  and  third  line  of  M  to  obtain 
the  codebook  represented  below, 


2-dwnerttbnal  vector  quantizer 
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Abstract  —  A  transmitted  signal  is  decomposed 
into  two  parts  which  are  then  encoded  using  fixed- 
and  variable- length  coding,  respectively.  Com¬ 
pared  to  conventional  variable-length  codewords 
with  synchronization-symbols  or  fixed-length  cod¬ 
ing  strategies,  the  proposed  method  eryoys  a  better 
distortion-rate  performance  on  particular  channels. 

I.  Joint  Fixed-  and  Variable- length  Coding 

A  fixed-length  coding  strategy  is  optimal  for  a  uniformly 
distributed  (flat)  source  probability  density  function  (pdf). 
For  a  zero-mean  memoryless  Gaussian  source,  it  is  possible  to 
find  an  approximately  flat  region  centered  around  the  origin 
of  its  pdf.  This  region  can  be  encoded  using  fixed-length  code¬ 
words  while  slightly  losing  from  compression  efficiency.  In  the 
mean  time,  it  is  still  possible  to  encode  the  tail  of  this  distri¬ 
bution  by  a  variable-length  coding  scheme.  A  typical  system 
is  shown  below. 


In  this  system,  when  a  signal  value,  x ,  saturates  in  the 
fixed-length  coded  quantizer,  Qf,  x  is  subtracted  from  its 
fixed-length  quantized  version,  xj  and  the  switch  between 
fixed  and  variable-length  quantizers  is  closed.  The  difference 
is  then  passed  to  the  residual  stage,  where  this  saturation  off¬ 
set,  x3  is  quantized  by  variable-length  quantizer,  Qv,  and  the 
quantized  residue  x„  is  encoded  using  a  variable-length  code. 
After  transmission  of  if  and  x^,  the  receiver  checks  whether  a 
signal  value  is  saturated  or  not,  using  the  received  fixed-length 
coded  part,  Xf,  and  if  so,  it  will  decode  and  add  the  received 
quantized  difference,  xv,  on  top  of  x/. 

S-parameter  gives  the  width  of  the  reserved  non-saturating 
region  of  the  signal  pdf  and  Rfixed  denotes  the  bit  rate  re¬ 
served  for  the  fixed-length  quantizer,  Q /,  from  the  overall 
total  rate,  Rtotai ■  For  a  given  channel  and  Rtotai  value,  our 
design  goal  is  to  find  the  optimal  (S,  Rfixed)  pair  so  that  the 
distortion  of  the  reconstructed  signal  at  the  receiver  is  mini¬ 
mized.  Since,  in  noisy  channels,  an  analytical  distortion  anal¬ 
ysis  for  variable-length  coded  data  is  not  possible,  the  “best” 
( S ,  Rfixed)  pair,  rather  than  the  optimal,  is  found  by  the  using 
operational  rate-distortion  characteristics. 
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II.  Simulations 

The  fixed-length  quantizer  utilized  in  the  simulations  is  a 
derivative  of  Lloyd-Max  quantizer  (LMQ)  [1].  The  variable- 
length  coded  or  saturated  component  is  quantized  using  an 
ECSQ  [2],  whose  output  indices  are  entropy  coded  using  Huff¬ 
man  coding.  This  variable-length  bit-stream  is  protected  from 
error  propagation  by  carving  it  up  into  slices  and  adding  end- 
of-slice  (EOS)  markers.  The  simulated  channel  is  taken  to  be 
a  simple  binary  symmetric  channel  (BSC)  and  simulations  are 
conducted  for  different  bit  error-rates  (BER). 


SNR  vs.  BER  :  Gaussian  memoryless  source  for  fixed- 
length,  variable-length  and  joint  fixed  and  variable-length  cod¬ 
ing  (solid  line)  for  (left-to-right)  Rtotai  =  3,  4,  5  and  6  b/p. 

III.  Conclusion 

For  some  BER  range  (between  10~4  and  error-free),  sepa¬ 
rating  a  symbol  into  two  (saturating  and  non-saturating)  parts 
and  encoding  these  parts  appropriately,  is  advantageous  com¬ 
pared  to  variable-  or  fixed-length-only  strategies.  The  most 
attractive  property  of  this  approach  is  its  capability  for  di¬ 
viding  any  source  into  subsources  with  different  error  immu¬ 
nities.  Hence,  unequal  channel  error  protection  is  possible  in 
the  symbol  level  for  any  source. 
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Abstract  —  In  next  generation  wireless  commu¬ 
nication  systems,  packet-oriented  data  transmission 
will  be  implemented  in  addition  to  standard  mobile 
telephony.  Designing  efficient  schemes  for  packet 
transmission  on  top  of  an  existing  connection-oriented 
CDMA  system  will  be  a  challenge  for  system  design¬ 
ers.  In  this  work,  we  take  an  information-theoretic 
view  of  some  simple  protocols  for  reliable  packet  com¬ 
munication  based  on  “hybrid  ARQ”. 

In  order  to  support  new  sevices  (e.g.,  wireless  mobile  ac¬ 
cess  to  the  Internet),  next  generation  wireless  communication 
systems  will  implement  packet-oriented  data  transmission  in 
addition  to  standard  mobile  telephony.  This  implies  bursty 
sporadic  communication  from  a  large  population  of  users,  that 
may  require  instantaneous  large  data  rates  and  very  small  er¬ 
ror  probabilities  for  a  short  time.  On  the  other  hand,  next 
generation  systems  will  be  based  mainly  on  CDMA,  which  is 
suited  to  continuous-mode  transmission  and  (at  least  in  its 
current  conventional  implementation  [1])  it  requires  closed- 
loop  power  control.  Then,  a  challange  for  future  system  de¬ 
signers  is  to  implement  efficient  schemes  for  packet  transmis¬ 
sion  on  top  of  an  existing  connection-oriented  CDMA  system, 
preserving  the  uncoordinated  access  flexibility  of  the  latter.  In 
essence,  next  generation  wireless  systems  should  be  regarded 
as  “composite”  systems  where  several  subsystems  with  very 
different  power,  rate,  reliability  and  delay  constraints  will  co¬ 
exist,  sharing  the  same  bandwidth. 

Motivated  by  the  above  consideration,  we  take  an 
information-theoretic  view  of  some  simple  protocols  for  re¬ 
liable  packet  communication  based  on  “hybrid  ARQ”,  i.e. , 
on  combining  channel  coding  and  Automatic  Retransmission 
reQuest  (ARQ).  We  model  low-power  low-rate  continuous¬ 
mode  traffic  as  background  white  Gaussian  noise  for  the  high- 
rate  high-power  bursty  users.  Random  user  activity  prevents 
closed-loop  power  control  and  user  coordination.  Then,  we 
assume  that  users  transmit  their  signal  bursts  at  very  high 
instantaneous  power  and  in  a  completely  uncoordinated  way. 
The  receiver  is  formed  by  a  bank  of  conventional  single-user 
decoders,  and  does  not  implement  joint  decoding.  We  refer  to 
this  model  as  the  Gaussian  collision  channel  [3].  The  trans¬ 
mission  of  each  user  is  governed  by  an  hybrid  ARQ  protocol, 
designed  in  order  to  achieve  very  low  error  probability. 

We  consider  a  slotted  multiple  access  Gaussian  channel 
with  fading.  We  study  the  system  performance  in  terms  of 
throughput  (total  bit/s/Hz)  and  average  delay  for  three  sim¬ 
ple  idealized  hybrid  ARQ  protocols:  a  coded  version  of  Aloha, 
a  repetition  scheme  with  maximal-ratio  packet  combining  and 
an  incremental  redundancy  scheme  with  general  coding.  By 
applying  the  renewal-reward  thereom  [4],  we  obtain  a  closed- 
form  throughput  formula  under  a  delay  constraint  (time-out) 
and  code  rate  constraint.  Since  we  consider  random  coding 
and  typical  set  decoding,  our  results  are  independent  of  the 


particular  coding/decoding  technique  and  should  be  regarded 
as  a  limit  in  the  information  theoretic  sense.  Then,  we  study 
asymptotic  behaviors  with  respect  to  various  system  parame¬ 
ters.  The  system  throughput  is  compared  to  that  of  a  conven¬ 
tional  CDMA  with  conventional  decoding.  Interestingly,  the 
ARQ  system  is  not  interference-limited  even  if  no  multiuser 
detection  or  joint  decoding  is  used  (arbitrarily  high  through¬ 
put  can  be  obtained  by  increasing  the  user  transmit  power), 
as  opposed  to  conventional  CDMA. 

As  a  byproduct  of  this  analysis,  we  provide  a  stronger  op¬ 
erational  meaning  to  the  information  outage  probability  of 
block-fading  channels  and  we  obtain  the  closed  form  prob¬ 
ability  distribution  of  signal-to-interference  plus  noise  ratio 
(SINR)  with  Rayleigh  fading  and  a  Poisson-distributed  num¬ 
ber  of  interferers,  extending  the  result  of  [5]. 

In  the  full  paper  [2],  we  give  all  the  details  of  the  proofs 
and  a  wide  range  of  numerical  results  illustrating  the  perfor¬ 
mances  of  the  examined  ARQ  protocols,  as  well  as  a  com¬ 
parison  with  conventional  CDMA  (another  form  of  “collision 
channel”)  which  shows  that  especially  for  high  SNR  the  slot¬ 
ted  ARQ  system  provides  great  potential  advantages.  In  fact, 
it  is  well-known  that  conventional  CDMA  is  interference  lim¬ 
ited  while  the  slotted  ARQ  system  is  not. 

As  a  conclusion,  we  can  say  that  as  far  as  packed  data 
communication  is  concerned,  it  is  more  useful  to  spend  the 
feedback  channel  to  provide  ACK/NACK  for  the  ARQ  proto¬ 
col  rather  than  to  provide  power  control  commands. 
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Abstract  — 

We  establish  a  region  of  reliably  received  rates  for 
time-slotted  ALOHA  systems.  We  combine  concepts 
from  multiple-access  channels  and  broadcast  channels 
to  determine  a  capacity  region  for  a  single  transmis¬ 
sion  of  a  packet  (which  is  long  enough  to  achieve  ca¬ 
pacity)  in  an  ALOHA  system  and  determine  capacity- 
achieving  strategies. 

I.  Introduction 

The  flexibility  of  ALOHA  systems,  which  were  first  proposed 
in  1970  by  Abramson,  makes  them  an  attractive  option  for 
wireless  data  applications.  In  the  original  ALOHA  system,  if 
a  collision  among  packets  occurs  at  the  receiver,  those  packets 
are  discarded  and  the  users  retransmit  those  packets.  Several 
coding  schemes  have  been  proposed  for  ALOHA  packets  to 
allow  at  least  part  of  the  data  in  the  packets  to  weather  out 
one  or  several  collisions.  Depending  on  the  presence  or  absence 
of  other  users,  each  user  will  be  able  to  achieve  different  rates. 
Since  some  transmitted  bits  will  be  lost  owing  to  collisions,  we 
consider  a  maximum  reliably  received  rate  region  rather  than 
maximum  reliably  transmitted  rate  region  ([4]). 

II.  Single  packet  system. 

We  consider  a  time-slotted  ALOHA  system  with  two  users. 
Users,  at  each  time  slot,  determine  according  to  a  Bernoulli 
process  whether  to  transmit.  A  packet  occupies  one  time  slot, 
which  is  long  enough  so  that  Shannon  capacity  is  approxi¬ 
mately  achieved  over  that  time  slot  (i.e.  the  slot  duration 
is  long  enough  so  that  Pe  is  approximately  zero  when  trans¬ 
mitting  at  the  Shannon  rate).  The  users  share  an  AWGN 
channel  with  noise  variance  a% .  Users  1  and  2  have  average 
power  constraints  cr\  and  erf. 

We  combine  concepts  from  rate  splitting  for  multiple-access 
communications  ([5])  and  broadcast  channels  ([1],  [2]).  The 
rationale  behind  our  approach  springs  from  the  following  ob¬ 
servation.  In  multi-access  channels,  rate  splitting  achieves  ca¬ 
pacity  by  creating  virtual  users  and  decoding  all  users  us¬ 
ing  interference  cancellation.  In  a  degraded  AWGN  broadcast 
channel,  the  low  resolution  code  is  decoded  by  considering 
the  high  resolution  code  as  noise.  Hence,  there  is  similar¬ 
ity  between  the  decoding  mechanism  for  achieving  capacity 
in  multiple-access  and  in  degraded  broadcast  channels  ([3]). 
In  the  system  we  consider,  a  user  codes  to  transmit  over  two 
possible  channels-  a  channel  with  the  other  user  present  and 
a  channel  without  the  other  user. 

We  begin  by  presenting  a  coding  scheme.  As  for  rate  split¬ 
ting,  we  divide  user  1  into  two  users,  U{  and  U" ,  which  send 
independent  WGN  signals  with  variance  po\  and  (1  —  P)a\, 
respectively.  User  2  maps  to  a  single  user,  Ui.  As  in  broad¬ 
cast  channels,  each  of  the  users  we  constructed  sends  two  mes¬ 
sages  on  two  separate  signals.  XJ[  sends  signal  LR[  and  H R[ , 
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which  are  independent  WGN  signals  with  variance  a'lficrf  and 
(1  —  a[)Pc r2,  respectively.  U"  sends  signal  LR"  and  HR", 
which  are  independent  WGN  signals  with  variance  a" (l—  (3)o\ 
and  (1  —  qi')(l  —  /3)of,  respectively.  U2  sends  signal  LR2  and 
HR2,  which  are  independent  WGN  signals  with  variance  a^erf 
and  (1  —  Q2)<7f,  respectively.  Note  that  all  as  and  /3  are  in 
[0,1].  We  decode  signals  (performing  interference  cancella¬ 
tion)  in  the  order:  first  LR[,  second  LR2,  third  LR" ,  fourth 
HR",  fifth  HR,2  and  sixth  HR[.  Our  arguments  can  easily  be 
extended  to  more  than  two  users.  Our  rate  region  is  defined 
as  the  achievable  rates  for  the  cases  wher  we  have  both  users, 
user  1  only,  user  2  only,  and  no  users.  Our  coding  scheme 
achieves  the  rate  region. 

III.  Expected  rate. 

We  may  select  the  as  and  /3s  to  maximize  the  expected 
achievable  rate,  when  the  users’  energies  and  probabilities  of 
transmission  are  fixed.  An  interesting  special  case  arises  when 
both  users  transmit  with  equal  probability.  Our  results  show 
that,  regardless  of  whether  we  operate  at  high  SNR  or  low 
SNR,  when  the  users  have  SNRs  which  are  comparable,  then 
we  do  not  need  to  split  the  users  between  HR  and  LR.  When 
we  have  highly  asymmetrical  SNRs,  then  for  low  enough  trans¬ 
mission  probability,  such  splitting  is  required  to  achieve  the 
capacity  region. 

IV.  Conclusions. 

We  have  determined  a  capacity  region  (where  capacity  re¬ 
gion  refers  to  the  rates  achievable  under  the  four  scenarios 
described  above)  for  an  ALOHA  system  in  the  case  of  a  single 
time  slot  with  very  long  length,  such  that  we  can  achieve  ca¬ 
pacity  over  a  single  packet  transmission.  We  may  extend  our 
results  to  several  time  slots. 

Instead  of  considering  a  single  transmission,  we  can  con¬ 
sider  several  transmissions.  In  the  limit  as  the  number  of 
transmissions  is  arbitrarily  large,  preliminary  results  show 
that  our  system  is  stable:  if  the  average  rate  arriving  to  the 
system  is  below  the  expected  rate,  that  rate  can  be  reliably 
received.  Another  interesting  area  of  further  research  is  maxi¬ 
mizing  the  expected  rate  when  the  average  power  (determined 
by  the  product  of  the  average  per-time  slot  power  given  by  a2 
and  the  probability  of  transmission)  is  fixed. 
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Abstract  —  An  equation  is  derived  whose  solution(s) 
yield  the  number  of  transmitting  mobiles  in  the  equi¬ 
librium  state(s)  for  a  DS-CDMA  network  employing 
code  combining(CC).  Numerical  results  for  a  simple 
form  of  CC  show  that  while  there  may  exist  multiple 
equilibria,  these  are  typically  clustered  together,  and 
do  not  cause  a  significant  degradation  in  throughput. 
The  results  also  show  that  CC  is  capable  of  eliminat¬ 
ing  bistability,  and  of  having  a  single  equilibrium  state 
at  which  the  throughput  is  slightly  better  than  that 
at  the  desirable  equilibrium  state  for  the  correspond¬ 
ing  DS-CDMA  network  employing  automatic  repeat 
request. 


I.  Introduction 

Code  combining  is  known  to  enhance  throughput  over  point- 
to-point  links.  A  receiver  operating  under  code  combining 
does  not  discard  information  from  a  transmission  received  in 
error;  instead  it  requests  an  additional  transmission  and  com¬ 
bines  information  from  the  new  transmission  with  that  from 
the  original  one  with  the  goal  of  increasing  the  probability  of 
successful  reception.  We  investigate  the  effect  of  using  code 
combining  as  the  link  layer  protocol  on  the  stability  of  a  direct- 
sequence  code  division  multiple  access  (DS-CDMA)  network. 
Details  may  be  found  in  [1]. 

II.  Model 

Consider  a  DS-CDMA  packet  data  network  consisting  of  a 
single  base  station  and  M  mobiles.  Each  mobile  transmits 
fixed  length  packets  with  a  spreading  gain  of  N  chips  per  bit. 
The  time-axis  is  divided  into  contiguous  equal-length  slots, 
each  of  which  has  a  duration  equal  to  the  time  required  to 
transmit  a  single  packet.  Mobiles  initiate  transmissions  only 
at  the  beginning  of  time  slots.  The  spreading  code  used  by  a 
mobile  is  assumed  to  change  from  slot  to  slot  ( e.g .,  IS-95),  so 
that  the  outcomes  of  the  transmission  attempts  of  the  active 
mobiles  are  assumed  to  be  mutually  independent,  both  within 
a  slot,  and  across  time  slots.  Perfect  power  control  is  assumed. 

Each  mobile  has  a  buffer  of  size  one.  If  a  mobile  has  a 
packet  in  its  buffer,  it  is  considered  “active”;  otherwise,  it 
is  “idle”.  In  any  given  time  slot,  all  active  mobiles  trans¬ 
mit,  and  each  idle  mobile  generates  a  packet  with  probability 
A.  The  receivers  at  the  base  station  are  assumed  to  be  of 
the  conventional  matched  filter  type.  Neglecting  the  effect  of 
thermal  noise,  the  bit-energy-to-interference  density  ratio  at 
the  receiver  for  any  given  mobile  becomes  €blh{i)  —  r,~\\  < 
where  j  is  the  number  of  active  mobiles.  The  probability  of 
a  successful  packet  transmission  by  an  active  mobile  is  a  non- 
decreasing  function  of  the  £b/Io  at  the  receiver  throughout 
the  packet  transmission  time. 

'This  work  was  supported  in  part  by  the  U.S.  Army  Research 
Office  under  grants  DAAH04-96-1-0377  and  DAAH04-96-1-0177. 


III.  Equilibrium  states 


We  assume  that  there  is  some  maximum  number  E  of  trans¬ 
missions  attempts  that  may  be  made  for  a  given  data  packet. 
A  packet  is  discarded  after  E  attempts;  higher  layer  protocols 
will  treat  it  as  lost,  and  take  the  appropriate  action.  Define 
the  random  sequence  5  —  {S„;n  6  Z+},  where  Sn  —  (xn,hn), 
xn  =  ,**),  and  hn  =  (hi,  hi, . . .  ,/if).  Here  x3n 

€  [0, 1]  is  that  fraction  of  all  mobiles  which,  in  slot  n,  are  mak¬ 
ing  their  jih  attempt  at  transmitting  some  data  packet,  and 
h3n  =  J2k= i  xn-j+ i  >s  that  fraction  of  all  mobiles  that  are  ac¬ 
tive  in  slot  n  -  j  + 1 .  The  vector  hn  represents  the  interference 
history  of  active  mobiles.  Call  Sn  the  state  of  the  network  at 
time  n,  n  G  Z+.  It  follows  from  the  modeling  assumptions  in 
Section  II  that  5  forms  a  Markov  chain. 

Denote  a  typical  state  of  the  network  by  s,  where  s  = 
( x,h ),  x  =  (x1, . . .  ,xE),  and  h  =  (h1, . . . ,  hE).  Write  x  = 


x1  H - +  xE  for  the  total  fraction  of  mobiles  that  are  active. 

Let  p’(h)  denote  the  probability  of  successful  reception  for 
packets  being  transmitted  for  the  jth  time  when  the  network 
is  in  state  s  =  (x,  h). 

Equilibrium  states  are  those  states  s  for  which  E[5„+i  | 
Sn  =  a]  =  s.  Solving  this  system  of  equations  yields  h 3  =  x, 
x3  =  A(1  -  x) nill  (l-pk(h)),  j  =  1,2 ,...,£,  and  x  = 


A(1  —  x 
Thus 


>C 


i  -  n;,',  (i  -  !'*<'>))) 

although  the  state  space  is  multi-dimensional — for 
a  given  arrival  rate  A,  the  equilibrium  states  s  are  uniquely 
determined  by  the  fraction  x  of  mobiles  that  are  active  in 
those  states.  Further,  given  A,  the  values  of  x  corresponding 
to  the  equilibrium  states  are  given  by  the  solutions  of  the 
equation  A  =  -  -  .  -r—  - — — tt,  0  <  x  <  1. 


(i-*)(i+Ef=2  nul  O-pTt*,* . *»))  ’ 

As  the  number  of  mobiles  and  the  spreading  gain  tend  to 
infinity  (with  their  ratio  held  constant),  the  evolution  of  the 
stochastic  system,  by  the  law  of  large  numbers,  converges  to 
a  deterministic  trajectory.  If  this  deterministic  trajectory  is 
globally  asymptotically  stable,  then  the  steady  state  proba¬ 
bility  distribution  of  the  fraction  of  mobiles  that  are  active 
converges  to  a  point  mass  at  the  unique  equilibrium.  It  is  not 
known  whether  the  deterministic  system  is  globally  asymp¬ 
totically  stable  in  general;  finding  the  appropriate  Lyapunov 
function  remains  an  open  problem. 


IV.  Numerical  results 

Numerical  results  show  that  the  use  of  CC  can  eliminate 
the  undesired  equilibrium  state  present  in  conventional  DS- 
CDMA  networks,  thereby  significantly  improving  throughput. 
In  cases  for  which  CC  has  multiple  equilibria,  the  numerical 
results  show  that  these  equilibria  are  typically  close  together 
and  do  not  significantly  degrade  throughput. 
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Abstract  —  We  consider  a  user  communicating  over 
a  fiat  fading  channel.  The  user  wishes  to  reliably  com¬ 
municate  bursty  data  over  this  channel  while  mini¬ 
mizing  both  the  average  power  and  the  average  de¬ 
lay  incurred.  We  formulate  a  buffer  control  problem 
which  illustrates  the  trade-off  between  these  quan¬ 
tities.  This  model  is  analyzed  using  dynamic  pro¬ 
gramming  techniques.  The  asymptotic  performance 
is  shown  and  asymptotically  optimal  buffer  control 
policies  are  given. 

I.  Introduction 

Motivated  by  wireless  communication,  channel  models  where 
the  output  conditionally  depends  on  a  time-varying  channel 
state  have  received  much  attention.  Cases  where  either  the 
transmitter  or  receiver  have  access  to  channel  state  informa¬ 
tion  (CSI)  have  been  well  studied.  Using  the  CSI,  the  trans¬ 
mitter  can  allocate  communication  resources  over  time  in  an 
effort  to  combat  the  fading.  In  this  work,  we  study  such  re¬ 
source  allocation  problems  for  a  single  user  in  a  flat  fading 
channel  when  both  the  transmitter  and  the  receiver  have  per¬ 
fect  CSI.  The  goal  is  to  minimize  the  transmission  power  re¬ 
quired  to  provide  the  user  with  an  acceptable  quality  of  ser¬ 
vice.  Minimizing  power  is  an  important  consideration  since 
mobile  users  rely  on  batteries  with  limited  energy. 

If  the  user  simply  required  a  long  term  average  rate,  then 
minimizing  the  average  power  needed  to  communicate  reliably 
is  equivalent  to  characterizing  the  channel’s  capacity.  This 
has  been  well  studied  for  a  large  class  of  fading  channels.  Ap¬ 
proaching  the  capacity  of  a  fading  channel,  typically  requires 
the  use  of  codewords  whose  length  is  long  enough  to  average 
over  the  channel  statistics.  We  consider  the  situation  where  in 
addition  to  an  average  rate,  the  user  requires  a  given  average 
delay.  When  delay  constraints  limit  codeword  lengths,  then 
capacity  may  not  be  a  useful  performance  criterion;  i.e.  one 
can  not  get  an  acceptable  probability  of  error  at  rates  near 
capacity  while  satisfying  the  delay  constraint.  This  is  the 
motivation  behind  the  work  on  outage  capacity  and  delay- 
limited  capacity.  We  also  assume  that  messages  arrive  from  a 
higher  layer  protocol  in  a  bursty  manner  and  are  placed  into 
a  transmission  buffer.  Delay  requirements  may  prevent  one 
from  removing  this  burstiness  through  source  coding. 

We  consider  the  following  model.  Assume  that  the  mes¬ 
sages  are  fixed  length  packets  of  log  M  bits  which  arrive  from 
some  higher  layer  application  and  are  placed  into  a  trans¬ 
mission  buffer.  Let  An  be  the  number  of  packets  that  ar¬ 
rive  at  time  n,  where  {An}  is  an  ergodic  Markov  chain  with 

1This  work  was  supported  by  the  Army  Research  Office  under 
grant  DAAG55-97- 1-0305. 


steady-state  average  arrival  rate  A.  Periodically,  the  trans¬ 
mitter  removes  a  packet  from  the  buffer,  encodes  it  into  one 
of  M  codewords  of  infinite  length,  and  begins  transmitting 
the  codeword  over  a  fading  channel.  Assume  the  channel  is  a 
complex,  additive  white  Gaussian  noise  channel  with  a  time- 
varying  gain  Hn  (E  C.  The  process  {Hn}  is  also  modeled  as 
an  ergodic  Markov  chain.  While  transmitting,  the  transmitter 
can  adjust  the  transmission  energy  by  scaling  the  input  by  an 
adjustable  gain.  This  decision  is  based  on  the  channel  state, 
the  buffer  occupancy  and  the  current  source  state.  Once  the 
receiver  can  decode  the  message  with  an  acceptable  proba¬ 
bility  of  error,  the  transmitter  stops  transmitting  the  packet 
and  proceeds  to  the  next  packet.  We  formulate  a  new  buffer 
model  where  the  buffer  occupancy  corresponds  to,  the  amount 
of  error  exponent  required  by  each  packet;  this  is  a  variation 
of  the  model  used  in  [1],  At  each  time  Un ,  the  amount  of 
exponent  to  be  transmitted,  is  chosen.  A  given  choice  of  Un 
requires  P(Hn ,  Un)  average  transmission  energy. 

We  consider  the  problem  of  minimizing  the  average  trans¬ 
mission  power  subject  to  a  given  average  delay  constraint.  Let 
P*{D )  be  the  minimum  average  power  required  for  the  aver¬ 
age  delay  to  be  less  that  D.  We  show  that  P*  (•)  is  always  a 
non-increasing  convex  function.  Each  point  of  P*(D )  can  be 
found  by  minimizing 


lim  — 

7i  — ►  o©  m 


m 


for  an  appropriate  choice  of  p.  Here  Sn  corresponds  to  the 
buffer  occupancy  at  time  n.  This  corresponds  to  solving  an 
average  cost  dynamic  programming  problem  where  the  cost  is 
a  weighted  sum  of  the  average  power  and  average  delay. 

We  study  the  behavior  of  P *  (D)  as  D  — >  co.  Our  approach 
to  this  problem  is  similar  to  the  work  in  [2]  on  buffer  control  for 
variable  rate  lossy  compression.  The  mathematical  structure 
underlying  these  problems  has  many  similarities.  Let  V{A) 
denote  the  limiting  value  of  P*{D).  We  characterize  V  and 
show  that  P*{D)  —  V{A)  =  0(1  /D2).  Finally  a  sequence  of 
simple  policies  is  given  which  exhibit  this  optimal  convergence 
rate.  These  policies  have  the  characteristic  that  the  transmis¬ 
sion  rate  is  a  function  only  of  the  the  channel  state  and  in 
which  of  two  regions  the  current  buffer  state  lies. 
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Abstract  —  We  compare  the  capacities  of  M- ary  pulse  posi¬ 
tion  modulation  (PPM)  on  Gaussian  and  Webb  channels,  which 
are  often  used  to  model  optical  channels  with  avalanche  photo¬ 
diode  (APD)  detectors.  Both  types  of  channels  exhibit  the  same 
brickwall  thresholds  on  minimum  signal-to-noise  ratio  per  infor¬ 
mation  bit  (bit-SNR)  for  different  values  of  M. 

Consider  a  symmetric  channel  with  input  signals  x  restricted  to  an 
M-ary  orthogonal  constellation  (such  as  PPM)  and  no  restriction  on 
the  channel  outputs  y.  The  maximum  mutual  information  between  x 
and  y  is  achieved  with  an  equiprobable  distribution  on  the  inputs,  and 
the  channel  capacity  can  be  evaluated  as 

C  =  log2M-£v|l,log2|;hljhi  ,1) 

where  v  is  any  random  vector  obtained  from  y  via  an  invertible  trans¬ 
formation, 

For  a  standard  additive  white  Gaussian  noise  channel  (AWGN-1), 
the  components  of  the  channel  output  vector  y,  given  one  of  the 
orthogonal  inputs  xj,  are  conditionally  independent  Gaussian  ran¬ 
dom  variables,  identically  distributed  except  for  yy.  y,-  is  N( 0,  a2), 
i  ^  j,  and  yj  is  N(m,  a2).  The  capacity  is  evaluated  from  (1),  using 

vj  =  yj/a  and  p  =  m2/<y2-. 

M 

C(p)  =  log 2  M  -  £V|Xl  log2  exP  [Vp(vJ  -  vl)]  (2) 
j= 1 

A  “double”  AWGN  channel  (AWGN-2)  adds  greater  noise  to  the 
orthogonal  component  in  the  direction  of  the  signal.  The  compo¬ 
nents  of  the  channel  output  y,  given  one  of  the  orthogonal  inputs 
xj,  are  conditionally  independent  Gaussian  random  variables,  iden¬ 
tically  distributed  except  for  yj:  yi  is  JV(mg,  a q),  i  ^  j<  and  yj  is 
N(m\,  a2),  with  m\  >  mq  and  o\  >  oq.  The  capacity  evaluated 
from  (1)  is 

C(p,  y)  =  log 2M 
M 

~  £v|xj  log2  exP  -  iq)  +  (1  -  y)(v2  -  u2)/2j  (3) 

7  =  1 

where  the  (conditional)  statistics  of  vj  =  (yj  —  mg)/ffo,  and  hence 

A 

the  capacity,  depend  on  two  parameters  p—{m\  —  mo)  /°q  m,d 

A 

y  =  <  1,  rather  than  on  four  parameters  m o,  m  i ,  cr\ . 

An  optical  channel  with  APD  detectors  can  be  modeled  as  a 
“double”  Webb  channel  (Webb-2),  plus  additional  Gaussian  ther¬ 
mal  noise  [1],  A  Webb  random  variable  W(m,  a2,  &2)  =  m  +  wa 
is  a  scaled-and-translated  version  of  a  standardized  Webb  random 

A  n  rj 

variable  u>  =  W(0,  1,5  )  having  probability  density  p(w,  8  )  = 
— 7==  (1  +  w/S)~2/2e~w~ w  >  —8.  For  a  pure  Webb-2 

v  27 r 
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channel,  the  components  of  the  channel  output  y,  given  one  of  the 
orthogonal  inputs  xj ,  are  conditionally  independent  Webb  random 
variables,  identically  distributed  except  for  yy  y;  is  W(mQ,  <Tq  ,  <5q) , 
i  7^  j,  and  yj  is  W  (m\,  a2 ,  &2),  with  mi  >  mg,  ctj  >  erg,  and 
5l  >  5g.  The  optical  APD  channel  model  imposes  an  additional  in¬ 
terrelationship  y  =  8q/82.  The  capacity  is  then  evaluated  from  (1)  in 

terms  of  A  =  82  —  5^  as 


C(p,  y,  A)  =  log2  M 


M 

—  £v|xi  l°g2 

7  =  1 


p\ 

[Vr(vj  -  ~/py<  rrp) 

)  P(n<  yzy) 

p\ 

(>/y("i  -Vp);  737) 

(4) 


where  vj,  p,  and  y  have  the  same  definitions  (in  terms  of  the  Webb-2 
channel  variables)  as  for  the  AWGN-2  model. 

We  evaluated  the  M-dimensional  expectations  in  (2),  (3),  and 
(4)  accurately  via  Monte  Carlo  simulation.  Some  results  are  plot¬ 
ted  in  Fig.  1  for  the  AWGN-1  and  Webb-2  channels  for  different 
PPM  orders  M.  The  abscissa  in  this  figure  is  a  normalized  bit-SNR, 

pb  =  p/(2C).  Along  each  Webb-2  curve,  the  two  independent  vari¬ 
ables  held  constant  are  A  =  60.8  and  py/(  1  —  y)  =  17.6,  which 
correspond  to  a  representative  optical  APD  problem  with  r)tis  =  38 
detected  signal  photons  per  PPM  word  and  an  excess  noise  factor 
F  =  2.16.  The  Webb-2  capacity  curves  for  each  M  exhibit  the 
same  brickwall  thresholds  on  minimum  p\y  as  the  AWGN-1  capacity 
curves.  For  different  M,  these  thresholds  are  offset  from  each  other 
by  a  factor  M/{M  —  1),  representing  the  penalty  for  using  orthogonal 
signals  instead  of  a  simplex  signal  set.  In  the  limit  as  M  -*■  oo,  the 
minimum  py  approaches  (for  both  AWGN-1  and  Webb-2)  the  well- 
known  bit-SNR  threshold  of  —  1 .59  dB  for  a  standard  AWGN  channel 
with  no  restriction  on  the  channel  inputs. 


Fig.  1:  Capacity  of  M-ary  PPM  on  AWGN-1  and  Webb-2  channels. 
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Abstract  —  The  capacity  of  a  wireless  link  is  stud¬ 
ied  when  multiple  transmit  and  receive  antennas  are 
used.  Under  the  assumption  of  a  narrow-band  link 
and  a  rich  scattering  environment,  the  propagation 
medium  is  modeled  as  a  Rayleigh  flat  fading  with  a 
good  receive  diversity.  By  contrast  to  the  previous 
works,  the  assumption  of  decorr elated  transmit  anten¬ 
nas  is  relaxed.  This  enables  a  study  of  some  common 
scenarios  where  the  transmit  antennas  occupy  a  lim¬ 
ited  volume.  For  an  arbitrary  correlation  between  the 
transmitting  antennas,  tight  capacity  bounds  are  cal¬ 
culated  and  an  optimal  signaling  scheme  is  derived. 

I.  Introduction 

Recently,  Foschini  et  al.  studied  the  capacity  of  a  narrow- 
band  wireless  link  between  multiple  transmit  and  receive  an¬ 
tennas  and  nearly  optimal  transmission  schemes  when  the 
propagation  channel  is  assumed  Rayleigh  and  flat  with  i.i.d. 
coefficients  [1],  Such  a  modeling  is  rather  inaccurate  when 
multiple  transmit  antennas  occupy  a  limited  volume  which  is 
situated  far  away  from  the  receive  antennas.  The  indepen¬ 
dence  condition  is  relaxed  here  so  that  the  channels  corre¬ 
sponding  to  different  transmit  antennas  may  exhibit  an  ar¬ 
bitrary  correlation.  A  tight  lower  bound  of  channel  capacity, 
presented  in  this  work,  yields  an  optimal  transmission  scheme. 
This  bound  shows  that  a  limited  transmit  volume  is  charac¬ 
terized  by  a  limited  capacity;  this  capacity  may  be  achieved 
with  a  finite  number  of  transmit  antennas  when  the  received 
signal  is  due  to  a  local  scattering  in  the  receiver  vicinity. 

II.  Main  results 

Consider  a  flat  fading  channel  between  m  transmitting  and 
M  receiving  antennas  such  that 

xt  =  H  st  +  nt,  t  €  R,  (1) 

where  St  is  the  m  x  1  vector  of  the  transmit  antenna  out¬ 
puts,  Xt  is  the  M  x  1  vector  of  the  received  signals,  H  is  the 
M  x  m  channel  matrix  and  nt  is  the  M  x  1  vector  of  the 
AWGN.  Assume  that  each  entry  of  St  is  an  i.i.d.  series  and 
that  these  entries  may  be  mutually  correlated  with  fixed  total 
power:  E{stsf}  =  a2  C,  tr  (  C  )  =  1.  Define  g2  —  (a2 /a2) 
the  signal-to-noise  ratio  (SNR).  According  to  [2],  the  channel 
capacity  (in  bits  per  second  per  hertz)  is  given  by 

C  =log2det(  IM  +q2HCHh  )  .  (2) 

The  Rayleigh  channel  model  is  assumed  so  that  the  ele¬ 
ments  of  H  are  jointly  complex  circular  Gaussian.  As¬ 
sume  arbitrary  correlations  of  the  transmit  antennas  speci¬ 
fied  by  a  normalized  correlation  matrix  RT  =  E{JT^. H *,,,}, 
1  <  k  <  m  whereas  the  received  antennas  are  decorrelated 
(i.e.,  Rr  =  E —  I m,  1  <  l  <  M).  To  introduce 
the  core  result,  we  define  the  eigendecomposition  {[/,  A2}  of 


Rt^CRt^  such  that  RT%CRT%  =  UA2UH  with  a  diago¬ 
nal  A  =  diag{Afc}™=1  and  a  unitary  U.  Then  the  capacity  in 
(2)  admits  an  accurate  lower-bound  C*  <  C  such  that 

m 

C *  =  ^  log2  ( 1  +  g2  A l  xm-h i  )  >  Ai  >  . .  >  Am.  (3) 

ib=i 

where  the  random  quantities  XM-k+i  are  Gamma  distributed 
with  ( M—k  +  1 )  degrees  of  freedom.  This  bound  is  shown  to  be 
tight  at  high  and  moderate  SNR  and  big  M.  The  bound  in  [1] 
is  a  particular  case  of  (3)  when  RT  —  Im  and  C  =  (1/m)  Im. 

The  optimal  signaling  is  derived  that  maximizes  the  ap¬ 
proximate  expected  value  of  the  capacity  in  (3).  An  accurate 
approximation  is  due  to  Jensen’s  inequality:  E{C, }  <  C«,, 

771 

C„  =  ^log2(l  +  p2A£(M  +  l-*0)  ,  Ai  >  ...  >  Am.  (4) 

k  =  1 

The  capacity  C,  is  also  shown  converging  in  probability  to 
when  M  and  m  are  big.  This  capacity  may  be  reached  when  C 
has  the  eigenbasis  U  with  the  eigenvalues  that  obey  the  water 
pouring  distribution  for  a  given  set  {q2A\(M  +  1  —  fc)}™=1. 

III.  Numerical  example 
Consider  a  WLAN  scenario  in  the  5.2GHz  band;  6  transmit 
and  8  receive  linear  antenna  arrays  of  size  30cm  are  separated 
by  30m.  Major  scatterers  are  uniformly  distributed  in  the 
receiver  vicinity.  In  Fig.l,  solid  lines  show  the  empirical 
capacity  obtained  from  10000  random  trials  of  a  physical 
propagation  model  that  assumes  free  space  path  loss.  The 
empirical  capacity  driven  by  a  stochastic  model  (specified  by 
Rt),  its  stochastic  bound  C*  and  the  deterministic  approx¬ 
imation  Co o  are  depicted  by  dashed  lines,  dash-dotted  and 
vertical  lines  correspondingly,  for  optimal  (water  pouring) 
and  uniform  power  loading.  The  “i.i.d.  bound”  stands  for  the 
capacity  predicted  by  [1],  under  the  assumption  RT  =  Im. 


M  =  8,  m  =  6,  q2  =  lOdB. 
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Abstract  —  We  present  closed-form  expressions  for 
the  single-user  capacity  over  slow  nonselective  corre¬ 
lated  Rayleigh  fading  channels  having  equal  branch 
powers  and  the  same  correlation  between  any  pair 
of  branches.  Maximal  ratio  combining  (MRC)  is 
used  and  three  adaptive  transmission  schemes  are 
analyzed:  (1)  optimal  simultaneous  power  and  rate 
adaptation,  (2)  optimal  rate  adaptation  with  constant 
transmit  power,  (3)  channel  inversion  with  fixed  rate. 

I.  Introduction 

Consider  the  coherent  reception  of  some  digitally  modulated 
signal  with  L  diversity  branches  and  predetection  MRC.  Let 
7»,  t  =  1, . . .  ,L,  denote  the  instantaneous  signal-to-noise  ra¬ 
tio  (SNR)  of  the  ith  diversity  branch.  The  random  vari¬ 
ables  -y/Ti, . . . ,  yP/Z  are  identically  distributed,  each  having  a 
marginal  distribution  which  is  Rayleigh  with  second  moment 
2<t2  ( a  >  0);  the  covariance  between  any  pair  7t,7>,  *  ^  j,  is 
4p2<74  (0  <  p  <  1),  and  therefore  the  correlation  coefficient 
between  7;  and  7 j  is  p2.  Such  a  correlation  model  is  appro¬ 
priate  when  we  use  space  diversity  with  closely  packed  diver¬ 
sity  antennas.  The  total  instantaneous  received  SNR  using 
MRC  is  given  by  7  =  £,ti  7 <  •  Denoting  a  =  2<rHl+\L_1]p), 
b  =  2a^{i-p)  >  we  obtain  from  the  charactersitic  function  of  7 
the  following  expression  for  its  probability  density  function: 


/7(v)  =  abL 


{b—a)L 


L- 1 

„  —  bv  \  A  vk~~ l 

/  .  (b-a)-L-k(k-l)! 
k- 1 


V>0. 


The  case  of  i.i.d.  branches  ( p  =  0)  has  been  analyzed  in  [1]. 
II.  Channel  Capacity 

Under  the  condition  of  optimal  simultaneous  power  and  rate 
adaptation,  the  channel  capacity  Copra  (in  bits/sec)  is  given 
by  [2]  [1]  Copra  =  f-r(v)dv,  where  B  (in  Hz) 

is  the  channel  bandwidth  and  70  is  the  optimal  cutoff  SNR 
satisfying  -  i)  /7(v)  dv=  1 . 

Denoting  the  exponential  integral  of  order  one  by 
Ei  (c)  =  f^°  s-p-  dv,  c  >  0,  and  the  Poisson  distribution  by 

Pk(c)  =  e-c  X!n=o  7JT  >  we  Eet  the  following  closed-form  ex¬ 
pression  for  the  capacity  per  unit  bandwidth  (in  bits/sec/Hz): 

=  A  [e.(«i»>  (A)1"  -  {( A)1"'  - 1} 

n=l 

Since  the  transmission  in  suspended  when  7  <  70,  there  is 
an  outage  probability  which  is  given  by 


Pout  =  1  -  (S^)I'“1  e~°70  +  f  £  (i£-a)L~k p>(*n°)  •  (!) 


In  the  case  of  optimal  rate  adaptation  with  constant  trans¬ 
mit  power,  the  channel  capacity  is  given  by  [3]  [2]  [1] 
Cara  —  /0°°  In  (1  +  v)  /7(u)  dv ,  which  yields  the  following 

expression  for  the  capacity  per  unit  bandwidth: 

(tfe)1"’  «•&<«>  -  fEte)1-1’ 

k= 1 

x  jft(-6)£1(6)  +  £  £Pm(6)P*_m(-6) 

1  m=l 

In  the  case  of  channel  inversion  with  fixed  rate,  there  are 
two  schemes:  truncated  channel  inversion  with  fixed  rate,  and 
channel  inversion  with  fixed  rate  without  truncation.  With  the 
truncation  scheme,  the  channel  capacity  per  unit  bandwidth 
is  expressed  as  [1] 


E5  ^ 


(1  -  Pout)  , 


(2) 


where  Pout  is  given  by  (1).  The  cutoff  level  70  can  be  cho¬ 
sen  either  to  achieve  a  specific  outage  probability  Pout,  or  to 
maximize  (2).  A  closed -form  expression  for  the  capacity  can 

be  obtained  from  (2).  If  we  set  70  =  0  in  (2),  we  get  . - , 

the  capacity  for  channel  inversion  with  fixed  rate  and  without 
truncation.  In  this  case,  Pout  =  0. 

III.  Numerical  Results 

From  plots  of  the  channel  capacity  per  unit  bandwidth,  we 
find  that  the  capacity  increases  with  increase  of  diversity 
order  L  and  increase  of  average  received  SNR  per  branch 
E  [71 J  —  2<72,  as  expected.  While  the  capacities  COTa/B, 
Cci fT/ B  and  Ctifr/B  decrease  with  increase  of  p,  the  capacity 
Copra/ B  increases  sharply  with  p  for  small  positive  values  of 
p,  reaches  a  maximum,  and  then  decreases  as  p  increases  fur¬ 
ther.  It  is  also  to  be  noted  that  the  decrease  in  capacity  with 
increase  in  p  is  much  sharper  in  the  case  of  optimal  power  and 
rate  adaptation  as  compared  to  the  other  schemes.  In  the  case 
of  truncated  channel  inversion,  the  cuoff  SNR  70  which  maxi¬ 
mizes  the  capacity  decreases  with  increase  of  p.  A  comparison 
of  the  plots  for  the  different  schemes  shows  that  for  the  same 
channel  bandwidth  B,  C„vra  >  Cora  >  Ctifr  >  Ccifr- 
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Abstract  —  In  modeling  wireless  channels,  slow 
and  fast  fades  are  generally  decoupled.  We  show 
that  the  difference  between  true  capacity  and  that 
obtained  assuming  independence  of  fast  and  slow 
channel  fades  is  O  (elog(e)log  (—elog(e))),  where  e  is 
the  ratio  between  the  average  duration  of  fast  and 
slow  fades. 

Our  purpose  in  this  work  is  to  explicitly  take  into  ac¬ 
count,  in  the  capacity  computation  for  time-varying  fad¬ 
ing  channels,  the  fact  that  slow  fades  and  fast  fades  axe 
not  truly  decoupled.  Decoupling  slow  fades  from  fast 
fades  has  generally  been  used  as  a  first-order  approxi¬ 
mation.  We  consider  the  case  where  the  sender  channel 
side  information  (SCSI)  is  a  coarse  representation  of  the 
receiver  channel  side  information  (RCSI).  In  many  cir¬ 
cumstances,  RCSI  and  SCSI  are  asymmetric,  although 
related.  In  particular,  when  the  channel  is  rapidly  vary¬ 
ing,  providing  full  feedback  from  the  receiver  to  the  sender 
may  be  onerous  and  inefficient.  Recent  work  in  this  area 
has  considered  the  case  where  the  SCSI  is  a  deterministic 
function  of  the  RCSI  [1].  In  [1],  exact  capacity  results 
are  given  for  the  case  when  the  SCSI  remains  Markov.  If 
the  SCSI  and  the  RCSI  can  indeed  be  decoupled,  in  such 
a  way  that  both  remain  Markov,  then  the  results  of  [1] 
apply  directly. 

We  consider  a  discrete-time  finite-state  Markov  chan¬ 
nel  (FSMC).  The  RCSI,  which  we  term  the  micro  states, 
is  a  full  description  of  the  FSMC.  The  SCSI,  which  we 
term  the  macro  states,  is  a  coarser  representation  of  the 
states:  the  sender  only  knows  that  the  current  state  is 
within  one  of  a  set  of  states.  The  macro  states  repre¬ 
sent  the  long-term  behavior  of  the  channel,  i.e.  the  slow 
fades.  Note  that  fades  axe  possible  while  we  are  in  the 
good  macro  state  and,  conversely,  energy  surges  axe  pos¬ 
sible  while  we  are  in  the  bad  macro  state.  Although  the 
model  of  [1]  does  not  apply,  we  suspect  that,  as  the  spread 
between  the  speed  of  the  slow  fades  and  that  of  the  fast 
fades  grows,  the  results  of  [1]  should  become  an  increas¬ 
ingly  good  approximation  to  the  true  capacity.  Our  re¬ 
sults  support  this  intuition  and  quantify  the  effect  of  the 
spread  between  the  speed  of  the  slow  fades  and  that  of 
the  fast  fades.  However,  our  results  also  show  that  con¬ 
vergence  is  very  slow. 

We  consider  a  nearly  decomposable  model  ([2])  for  our 
FSMC.  Consider  a  discrete-time  Markovian  fading  pro¬ 
cess  defined  by  the  stochastic  matrix  A  +  eB,  where  A  is 
block-diagonal  with  M  blocks  and  the  ith  block  (which 
is  also  a  stochastic  matrix)  is  denoted  by  A,.  We  call 
the  set  of  fading  states  associated  with  the  ith  block  a 
macro  state  and  denote  it  by  Si.  We  assume  the  RCSI  is 
the  current  micro  state  of  the  channel  whereas  the  SCSI 
is  the  current  macro  state.  Let  be  the  stationary 
probability  vector  associated  with  A; ,  i.e.,  tt^ A,  =  ir^'K 


Define  an  M  x  M  matrix  P  as  follows:  the  (i,j)  en¬ 
try  of  P  is  given  by  Pi3  = 

keSi  leSj 

and  Pa  —  1  —  Note  that  P  is  also  a  stochas¬ 

tic  matrix  and  let  p  be  its  stationary  probability  vector, 
i.e.,  p  =  pP.  We  can  interpret  P  as  being  the  long-term 
transition  probabilities  among  macro-states  and  pi  as  ap¬ 
proximating  the  long-term  probability  of  being  in  Si,  i.e., 
Pi(e)  =  pi  +  O(e),  where  p,(e)  is  the  actual  probability  of 
being  in  micro-state  i. 

Let  T(n)  denote  the  random  variable  corresponding 
to  the  micro-state  at  time  n  and  define  S(n)  to  be  ran¬ 
dom  variable  corresponding  to  the  macro-state  at  time 
p.  The  sample  values  of  T(n)  is  denoted  by  t(n).  Fur¬ 
ther,  let  G(T(n))  be  the  random  variable  correspond¬ 
ing  to  the  signal  attenuation  at  time  n.  The  received  sig¬ 
nal  at  time  n  is  given  by  the  random  variable  Y(n)  = 
\J G(T(n))X  (n)  +  W(ri),  where  X(n)  is  the  transmitted 
signal  and  W(n)  is  AWGN  with  variance  a2.  Our  coding 
theorem  follows. 


Theorem  1 


Define 

M 


{P(i)}  2  ^  6  V  a2  / 

i= 1  t£Mi  V  ' 


.(*) 


subject 


to  J^ili  Pi(e)P(i)  —  where  V  is  the  power  constraint 
on  the  sender. 

Given  R  <  C  and  5  >  0,  we  can  find  an  e *  ( R )  and  n(8) 
such  that  for  all  e  <  e*,  there  exists  a  (n/e, 2nRP)  code 
whose  maximal  probability  of  error  is  less  than  8.  (Note 
that  e*  is  independent  of  8.) 


Define  Ct  rue  (e)  = 


lim  —  max 

n— >oo  Tl  p(xn|sn) 


7(X";{F",T"}), 


where  p(xfc|s")  =  p(xfc|sfc),  for  all  k  <  n.  Then, 
Ctrue(e)  =  C  +  O  (e\og(e)log  (~e\og(e))) . 

Suppose  for  some  R  >  0,  we  have  the  following  prop¬ 
erty:  for  every  8  >  0,  we  can  find  e*(R)  and  n(8)  such 
that  for  all  e  <  e* ,  there  exists  a  (n/e,  2nR/£)  code  whose 
maximal  probability  of  error  is  less  than  8.  Then,  R  <  C. 
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Abstract  —  We  apply  the  results  of  [2]  to  es¬ 
timation  of  Renyi  I-divergence  between  an  un¬ 
known  distribution  and  a  known  reference  dis¬ 
tribution  using  power  weighted  pruned  minimal 
graphs  spanning  a  random  sample  of  n  points 
from  the  unknown  distribution.  In  particular 
we  establish  that  the  weight  of  a  minimal  graph 
connecting  the  points  converges  a.s.  in  n  to  the 
I-divergence  after  a  suitable  change  of  measure. 

I.  Introduction 

Let  Xn  =  {xi ,  X2, . . . ,  in}  denote  a  sample  of  i.i.d. 
data  points  in  Rd  having  unknown  Lebesgue  multivari¬ 
ate  density  /(x)  supported  on  [0,  l]d.  Define  the  order 
v  Renyi  I-divergence  [1]  with  respect  to  a  dominating 
reference  density  /0(x) 

/.</,  M  =  JHTY ln /  («x>)  '•<*>*  (I> 

The  I-divergence  takes  on  its  minimum  value  (equals 
zero)  if  and  only  if  /  =  f0  (a.e.).  Iv{f,fo)  reduces  to 
the  Renyi  entropy  Hv{f)  when  f0  is  equal  to  a  uniform 
density  over  [0,  l]d.  Special  cases  of  interest  are  ob¬ 
tained  for  v  =  |  for  which  one  obtains  the  log  Hellinger 
distance  squared  and  for  v  — >■  1  for  which  one  obtains 
the  Kullback-Liebler  divergence. 

II.  MST’s  and  Entropy  Estimation 

A  spanning  tree  T  through  the  sample  Xn  is  a  con¬ 
nected  acyclic  graph  which  passes  through  all  the  n 
points  {xj}j  in  the  sample.  T  is  specified  by  an  or¬ 
dered  list  of  edge  (Euclidean)  lengths  e, j  connecting 
certain  pairs  (xi,Xj),  i  ^  j,  along  with  a  list  of  edge 
adjacency  relations.  The  power  weighted  length  of  the 
tree  T  is  the  sum  of  all  edge  lengths  raised  to  a  power 
7  6  (0,  d),  denoted  by:  lel7-  The  m*n*mal  span¬ 

ning  tree  (MST)  is  the  tree  which  has  the  minimal 
length  L(Xn)  =  minr  52eer  leP-  Tor  anY  su^set  Xn,k 
of  k  points  in  Xn  define  Tx„  k  the  fc-point  MST  which 
spans  Xn^k-  The  &-MST  is  defined  as  that  fc-point  MST 
which  has  minimum  length.  Thus  the  &-MST  spans 
the  densest  fc-dimensional  subset  X*k  of  Xn.  The  k- 
MST  computation  is  NP  complete.  In  [2]  we  presented 
asymptotic  results  for  a  d-dimensional  extension  of  the 

1This  research  was  supported  in  part  by  AFOSR  under 
MURI  grant  F49620-97-0028. 


planar  fc-MST  approximation  of  Ravi  et  al,  called  the 
greedy  Ic-MST  approximation,  which  runs  in  polyno¬ 
mial  time. 

Let  v  €  (0, 1)  be  defined  by  v  =  {d  —  ^)/d  and  define 
the  statistic 

Hv{X:,k")  =  ^  In  {n-v L{X:<k))  +  0{u,  d)  (2) 


where  the  minimization  is  performed  over  all  d- 
dimensional  Borel  subsets  of  [0,  l]d  having  probability 
P(A)  —  fA  f0(x)dx  >  a. 

Let  /  follow  the  mixture  model 

/  =  (1  -  e)/r  H-  (4) 

where  f0  is  a  known  outlier  density  and  fi,  e  £  [0, 1]  are 
unknown.  Then  for  small  e  and  a  close  to  one  it  can 
easily  be  shown  that  the  right  hand  side  of  (3),  which  is 
/*(/,/«,),  is  to  a  close  approximation  h(fi,fo)-  Thus 
H„W,k)  is  a  robust  estimator  of  h{fi,  fo)- 

Note  the  following:  the  estimator  Hv(yn,k)  does  not 
require  performing  the  difficult  step  of  density  estima¬ 
tion;  estimates  of  various  orders  v  of  h,  can  be  obtained 
by  varying  the  edge  power  exponent;  the  sequence  of 
trees  Vn,2,  •  •  •  34, n  —  yn  provides  a  natural  extension  of 
rank  order  statistics  for  multidimensional  data.  Here 
k  plays  the  same  role  as  the  parameter  a  in  the  a- 
trimmed  mean  estimator  for  1-dimensional  data. 
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where  0  is  a  constant  equal  to  the  t»-th  order  R6nyi 
entropy  of  the  uniform  density  on  [0,  ljd.  Let  G(x)  be 
the  coordinate  transformation  on  [0,  l]d  which  maps  the 
reference  distribution  f0  to  a  uniform  distribution  and 
define  the  transformed  data  sample  yn  —  G(Xn).  Then 
using  the  results  of  [2]  it  can  be  shown  that  ^(Tn.n) 
is  an  a.s.  consistent  estimator  of  the  I-divergence  (1). 
Furthermore,  with  a  =  k/n,  H„(y„  k)  is  an  a-trimmed 
estimator  of  I-divergence  in  the  sense  that 

^(X,fc)-t  min  -i—ln  /"  ( fo(x)dx  {a.s.)  (3) 

A:P(A)>a  1  -  V  JA\Jo{X)J 
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Abstract  —  A  channel  decoder  employing  the  a 
posteriori  probability  (APP)  algorithm  can  be  for¬ 
mulated  so  that  its  inputs  and  its  outputs  are  log- 
likelihood-ratios  (LLR):  channel  LLRs  of  the  code  bits 
are  accepted,  and  a  posteriori  LLRs  of  the  info  bits 
and/or  the  code  bits  are  delivered.  Since  decoding 
improves  the  reliability,  the  APP  algorithm  can  be 
interpreted  as  a  non-linear  filter  for  LLRs.  The  “LLR 
amplification”  depends  on  the  distance  properties  of 
the  channel  code;  for  high  signal-to-noise  ratios  it  is 
dominated  by  the  minimum  distance. 


Summary 

The  APP  algorithm  [1]  accepts  a  priori  probabilities  and 
channel  probabilities  as  inputs  and  delivers  a  posteriori  prob¬ 
abilities  as  outputs.  With  additional  computation  of  soft  out¬ 
puts  for  the  code  bits  [2]  [3]  and  with  usage  of  LLRs  instead  of 
probabilities  [4],  it  can  be  extended  to  the  logarithmic  APP 
(Log  APP). 

Consider  a  binary  linear  convolutional  encoder  of  rate  R  = 
k/n.  Let  e  denote  the  path  through  the  trellis  associated  with 
the  info  word  tt(e)  and  the  code  word  x(e),  u,  x  €  {+1,  — 1}- 
The  code  bits  are  transmitted  over  a  memoryless  channel;  the 
received  value  of  a  single  bit  is  denoted  by  y,  and  the  received 
word  is  denoted  by  y. 

The  LogAPP  algorithm  takes  the  a  priori  LLRs  of  the  info 
bits  U  and  the  channel  LLRs  of  the  code  bits  X, 


L~  (U)  =  In 


P(U  =  +1) 
P(U  =  - 1)’ 


L~(X)  =  In 


P(X  =  +l\y) 
P(X=-l\y)’ 


(1) 


and  computes  the  a  posteriori  LLRs  of  the  info  bits  and  of  the 
code  bits 


L+(U)  =  In 


P(U=+l\y) 

p(u  =  -i\yy 


L+{X)=  In 


P(X  =  +l|y) 
P(X  —  — i|y) 


(2) 


These  inputs  and  outputs  of  the  LogAPP  algorithm  are  de¬ 
picted  in  Fig.  1.  In  the  following,  the  info  bits  are  assumed  to 
be  equally  distributed,  i.e.  L~  (U)  =  0. 


L-(U) 

L~(X) 


LogAPP 


L+(U) 

L+(X) 


Fig.  1:  The  input  and  the  output  LLRs  of  the  LogAPP  algorithm. 


The  purpose  of  decoding  is  to  improve  the  reliability  of 
the  bits.  This  motivates  to  interpret  decoding  as  non-linear 
filtering,  as  mentioned  in  [2].  In  this  paper,  the  LogAPP  is 
treated  as  a  non-linear  LLR  filter.  This  point-of-view  suggests 
to  define  an  info  bit  LLR  amplification  (ILA)  and  a  code  bit 
LLR  amplification  (CL A): 


ILA  = 


Ev  L+(U) 
Ev  L-{X) 


,  CLA  = 

L~(V) 


Ey  L+(X) 
Ey  L-(X) 


L-(U) 


(3) 


where  Ev  denotes  the  expected  value  with  respect  to  y.  The 
ILA  can  be  regarded  as  the  transfer  function  of  a  soft-decoder ; 
since  there  are  less  output  values  than  input  values,  the  soft- 
decoder  is  similar  to  a  decimator.  The  CLA  can  be  regarded 
as  the  transfer  function  of  a  soft-repeater ,  i.e.  a  device  which 
performs  decoding  and  re-encoding  using  soft  values. 

For  rate  1/2  convolutional  codes  with  memories  2  to  8, 
binary  transmission  over  an  AWGN  channel  was  simulated. 
In  Fig.  2,  the  ILA  and  the  CLA  are  depicted  as  a  function 
of  the  mean  channel  LLR  Eu  L~  (X)  of  the  code  bits.  The 
following  characteristics  can  be  justified  analytically: 

1.  For  low  input  LLRs,  the  ILA  approaches  0  and  the  CLA 
approaches  1. 

2.  For  high  input  LLRs,  both  the  ILA  and  the  CLA  ap¬ 
proach  a  constant  value  which  can  be  identified  with  the 
free  distance  of  the  code. 


Fig.  2:  The  LLR  amplifications  of  the  convolutional  codes  with 
memories  2  to  8. 
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Abstract  —  We  present  a  fourth-moment  measure  of  the 
“bandwidth”  of  a  strictly  time-limited  signal  and  obtain  a 
minimum-bandwidth  basis  for  L2(a,b).  Such  a  basis  consists 
of  orthonormal  waveforms  with  the  smallest  obtainable  band- 
widths.  The  primary  advantage  of  the  fourth-moment  bandwidth 
relative  to  the  Root  Mean  Square  (RMS)  and  Fractional  Out-of- 
Band  Energy  (FOBE)  measures  is  that  its  basis  functions  have  a 
0(1//3)  frequency  roll-off  compared  to  the  0(1/ f2)  and  0(1//) 
decay  of  the  RMS  and  FOBE  basis  functions,  respectively. 

I.  Main  Result 

Every  strictly  time-limited  pulse  has  a  spectrum  which  is  non¬ 
zero  for  an  infinite  range  of  frequencies.  Hence,  non-strict  measures 
of  bandwidth  are  used  to  quantify  the  spectral  concentration  of  such 
signals.  Two  such  measures,  namely  the  RMS  and  the  FOBE  band- 
widths,  have  been  studied  in  the  past.  In  particular,  it  was  shown  that 
the  minimum  RMS  and  FOBE  bandwidth  orthonormal  basis  func¬ 
tions  for  L2( 0,  T)  are  sinusoids  sm(km/T)  for  integer  k  [1]  and  the 
set  of  time-truncated  prolate-spheroidal  wave  functions  [2],  respec¬ 
tively.  In  this  paper  we  consider  the  fourth-moment  bandwidth  and 
obtain  the  corresponding  minimum  bandwidth  orthonormal  basis. 

Definition  1  (Fourth-Moment  Bandwidth  Measure)  For  a  base¬ 
band  signal  with  energy  spectrum  Sx(f),  the  fourth-moment  band¬ 
width  is  defined  as 


bw(jc)  - 


jr„f%(f)df 

j:„sx(f)df 


a) 


Definition  2  (Minimum-Bandwidth  Basis)  Let  the  collection  of 
functions  (B  =  be  an  orthonormal  basis  for  L2(—T /2,T /2) 

(the  space  of  square-integrable  functions  with  standard  inner  prod¬ 
uct),  and  let  the  bandwidth  measure  be  defined  through  (1).  If  \|t*  has 
the  minimum  bandwidth  of  all  L2  functions  which  are  orthogonal  to 

{¥,}?=/.  i  e-, 

Vi  =  arg  min  bw(jr)  (2) 

xeL\-Z_l) 


for  all  k,  then  5S  is  a  minimum-bandwidth  basis  for  L2(—T /2,  T /2). 


The  main  result  of  this  paper  is  that  the  minimum  bandwidth  basis 
functions  for  the  fourth-moment  bandwidth  measure  are  solutions  of 
the  eigenvalue/eigenfunction  equation 

where  T  denotes  the  time-limiting  operator  (to  the  interval 
[—772,772]).  The  boundary  conditions  are  imposed  by  requiring 

'This  work  was  supported  in  part  by  NSF  Grant  NSF  Grant  CCR-9706591 


Figure  1:  Magnitude  spectra  of  \(/i  (0  for  the  FOBE  case  (with 
BT  —  2n),  the  RMS  bandwidth  and  the  fourth-moment  mea¬ 
sure,  all  for  T  =  1. 


the  solutions  to  lie  in  kfQ(—T /2,  T /2)  which  are  the  time-limited  el¬ 
ements  of  the  Sobolev  space  W2  of  functions  on  R  defined  as  [3], 

W2={x:\\(l+f2)x(f)\\2<~}  .  (4) 

It  can  be  shown  that  the  eigenvalues  y*  of  (3),  which  are  equal  to 
the  bandwidths  of  the  respective  basis  functions,  are  given  as  y*  = 
(<j),t/27r)4,  where  the  p/s  are  the  positive  solutions  to 

cos(<t>*772)  sinh(<j>yt7’/2)  +  sin(<t>/T/2)  cosh(0*T/2)  =  0  (5) 

and 

cos(<{»^7’ /2)  sinh^T /2)  —  sin(<t>^7~ /2)  cosh((])^r /2)  =  0  (6) 

The  eigenfunctions  are  given  for  t  6  [-7" /2,  T /2]  by 

\J  r(i+a,2)  (cos(<t>^r)  -bat cosher))  k,  odd 
\f  T(l+al)  (sin(<J>*r)  -Fa*  sinh((p*r))  k,  even 

where  a*  =  -  cos (tJ^T /2)/cosh (t^T /2). 

A  comparison  of  the  frequency  roll-off  of  the  minimum  FOBE, 
RMS  and  fourth-moment  bandwidth  functions  can  be  seen  in  Fig¬ 
ure  1.  This  figure  reveals  that  while  the  minimum  fourth-moment 
bandwidth  basis  function  has  a  somewhat  larger  main  lobe  than  the 
truncated  prolate-spheroidal  function  and  the  half  sinusoid,  its  rate  of 
side-lobe  decay  is  significantly  better. 
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Abstract  —  The  similarity  of  the  Berlekamp-Massey 
(B-M)  algorithm  and  the  Welch-Berlekamp  (W-B)  al- 
goritm  is  demonstrated  in  showing  that  both  algo¬ 
rithms  are  special  instances  of  one  iterative  model¬ 
ing  procedure.  In  particular,  from  Reed  &  Solomon’s 
original  problem  statement  a  W-B  type  algorithm  is 
directly  derived  through  a  system-theoretic  interpo¬ 
lation  approach. 

Reed  &  Solomon’s  original  curve  fitting  formulation  for 
decoding  a  (n,  k)  Reed-Solomon  code  over  a  finite  field  F 
is  readily  reformulated  as  a  minimal  interpolation  problem, 
see  [1]  and  references  therein.  From  this  a  system-theoretic 
formulation,  involving  trajectories  of  time  bi  :  Z+  F2,  can 
be  obtained  as  follows.  Let  the  code  locations  be  given  by 
xi, . . . ,  xn,  define  G(s)  ~  ( s  —  x„_*+2)  ••■(«  —  xn)  and  let 
(ri, . . .  ,r„)  be  a  received  word.  Without  restrictions  we  may 
assume  that  rn_*-|.i  =  ■■■  —  r„  =  0.  Next  let  trajectories  bi 
be  defined  by 


bi 


1 

G(Xi) 


1 

0 


0 

G(a) 


bi 


with 


.  ‘-([7] 

for  i  =  1, . . . ,  n  —  k  +  1.  Here  a  denotes  the  backward  shift. 
The  decoding  problem  can  now  be  formulated  as:  find  a  rep¬ 
resentation  with  minimal  row  degrees 


'  D(o)  -N(<j)  ' 

'  o  ' 

.  K(a)  -Q(a) 

w  — 

0 

for  the  behavior  B  spanned  by  the  trajectories  6i , . . . ,  bn-k+ i  • 
Thus  we  adopt  a  so-called  behavioral  system-theoretic  ap¬ 
proach,  see  [8]  for  more  details. 

The  above  corresponds  to  a  slight  variation  of  the  W-B  key 
equation  in  which  polynomials  D  and  N  are  sought  with  deg 
D  minimal  such  that  for  j/;  :=  n/G(x i) 

D{xi)yi  =  N(xi)  (2) 

as  well  as  deg  N  <  deg  D  (rather  than  deg  N  <  deg  D ). 

In  earlier  research  [3]  it  was  shown  how  the  B-M  algorithm 
can  be  interpreted  as  a  special  instance  of  the  general  iter¬ 
ative  modeling  procedure  of  [8,  p.  289].  Below  we  outline 
an  iterative  algorithm  along  the  same  lines  for  constructing 
a  representation  (1),  thereby  solving  the  W-B  type  key  equa¬ 
tion  (2).  Our  algorithm  below  is  thus  another  instance  of  the 
modeling  procedure  of  [8].  In  particular,  like  the  B-M  algo¬ 
rithm,  it  makes  use  of  the  solution’s  degree  L  at  each  step  to 
determine  which  type  of  update  matrix  is  used.  In  this  re¬ 
spect  it  differs  from  the  W-B  algorithm  which  uses  a  different 
integer  parameter. 


Algorithm 

For  j  =  0, . . . ,  n  —  k  denote  Rj 
define 


Dj  —Nj 
Kj  -Qj 


Initially 


Ro  := 


1 

0 


0 

S  %n  —  fc-fl 


and  Lq  :=  0. 


Proceed  iteratively  as  follows  for  j  =  1, . . .  ,n  —  k.  Compute, 
after  processing  (xi  ,y,)  for  i  =  0, . . .  ,j,  the  numbers  A  j  and 
Tj  as  follows: 


A  j  •—  )yj  Nj-i(xj) 

Tj  :=  Kj-\(xj)yj  —  Qj-\(xj). 


Compute  the  matrix  Rj  and  the  integer  Lj  as  follows: 
Rj  :=  VjRj- 1, 

where,  if  A  j  ^  0  and  (Lj- 1  <  j/2  or  Tj  =  0), 


Vj(s):  = 


and,  if  otherwise, 


0 

1 


Vj(s)  := 


1 

0 


Lj  Lj- 1  +  1, 


Lj 


Lj-\. 


Topics  of  further  research  consist  of  the  derivation  of  insight¬ 
ful  and  efficient  algorithms  (see  also  [2,  5])  for  list  decoding 
based  on  a  behavioral  modeling  view. 
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Abstract  —  A  pipelined  finite-field  multiplier  struc¬ 
ture  in  conjunction  with  a  single  systolic  array  imple¬ 
mentation  of  the  Berlekamp-Massey  algorithm  leads 
to  a  highly  parallel  decoder  architecture  in  which  the 
critical  path  delay  is  an  order  of  magnitude  smaller 
than  the  path  delays  of  conventional  architectures. 


Introduction 


The  Berlekamp-Massey  algorithm  [1]  is  an  efficient  iterative 
method  for  solving  the  key  equation  in  BCH  decoding  that 
relates  the  (unknown)  error  locator  polynomial  A (z)  and  error 
evaluator  polynomial  u>(z)  to  the  (known)  syndrome  polyno¬ 
mial  S(z).  In  the  r-th  iteration,  the  algorithm  computes  the 
r-th  discrepancy  Ar,  and  then  updates  its  estimate  of  A(z)  and 
a  “scratch”  polynomial  B(z).  Even  in  highly  parallel  imple¬ 
mentations,  the  speed  bottleneck  is  this  iterative  loop  which 
requires  a  multiplication  (while  computing  Ar)  followed  by 
a  division  (while  updating  B{z)):  the  latter  is  more  time- 
consuming  than  the  former.  Fortunately,  the  division  can  be 
replaced  by  a  multiplication  as  described  in  [3].  Similarly,  the 
two  serial  multiplications  can  be  carried  out  in  parallel  if  Ar+i 
(which  is  to  be  used  in  the  next  iteration)  is  computed  at  the 
same  time  as  the  polynomials  are  being  updated  in  the  current 
iteration.  This  gives  the  following  algorithm,  implementable 
with  a  single  systolic  array,  for  a  f-error-correcting  BCH  code: 
Initialization:  A (0)(z)  =  ©<0)(z)  =  S(z)  +  z3< 
for  r  =  0  until  2f  —  1  do 

A(r+1)(z) -  _ a £r)e(r)(*) 

(e1'+,)(*),7,,+1>)  =  f  <eWW’yr,) 


.(0) 


1. 


Output:  A(z)  = 


Qa^JaCU) 

co(z)  =  A(2t>(z)  mod  z*. 


Here,  [a{z)/z!>\  denotes  the  quotient  when  a(z)  is  divided 
by  zs.  This  is  readily  implemented  by  shifting  when  s  =  1, 
whereas  the  output  polynomials  are  merely  different  parts  of 
the  A  register.  Note  that  A ^  =  A^(0)  is  the  r-th  discrep¬ 
ancy  and  it  is  always  the  low-order  symbol  in  the  A  register.1 


High-Speed  Implementations 

VLSI  implementations  of  the  algorithm  described  above  can 
be  expected  to  operate  roughly  twice  as  fast  as  the  the  imple¬ 
mentation  in  [3].  Even  faster  implementations  are  possible  for 
block-interleaved  codes,  provided  that  decoding  is  completed 
prior  to  de-interleaving.  For  a  code  interleaved  to  depth  M, 
the  decoder  structure  is  the  same  systolic  array  except  that 

1After  discovering  this  result,  we  found  that  it  had  been  pub¬ 
lished  already  in  [4],  It  also  appears  in  [2], 


each  storage  cell  now  consists  of  a  serial  M- stage  register. 
The  critical  path  delay  is  no  different  from  that  in  the  origi¬ 
nal  circuit.  However,  the  results  of  a  polynomial  update  cure 
not  required  during  the  next  M  —  1  cycles  while  other  (inter¬ 
leaved)  codewords  are  being  processed.  This  allows  the  use  of 
a  pipelined  multiplier  that  computes  the  product  of  two  ele¬ 
ments  of  GF(2m)  in  m  clock  cycles  (assuming  that  M  >  m.) 

Let  Y  =  y0  4-  yia  +  y2a2  +  ■  ■  •  be  an  element 

of  GF(2m).  The  pipelined  multiplier  architecture  is  based  on 
writing  the  product  of  X  and  Y  as 

X(yo  +  yia  +  y2a2  d - ?/m-i«m_1)  = 

Xy0  +  ( Xa)yi  +  ((Xa)a)y2  H - (•••((Ia)a)  •  ■  -a)ym-i 

which  can  be  computed  by  adding  X  into  an  empty  accumu¬ 
lator  (or  not)  according  as  yo  is  1  (or  0).  Simultaneously,  X 
is  multiplied  by  a  to  produce  Xa.  Then,  Xa  is  either  added 
(or  not)  to  the  accumulator  according  as  1/1  is  1  (or  0),  while 
simultaneously,  Xa  is  multiplied  by  a  to  produce  Xa2-,  and  so 
on  ...  for  m  stages.  Multiplication  by  a  is  easy  to  implement, 
and  thus  the  critical  path  for  this  new  decoder  architecture 
passes  through  only  one  Exclusive  OR  (XOR)  gate  and  one 
2-to-l  multiplexer.  This  is  an  order  of  magnitude  smaller  than 
the  delay  in  a  conventional  multiplier. 

Ignoring  wiring  delays  and  other  non-idealities,  an  0.18  pm 
CMOS  technology  Reed-Solomon  decoder  over  GF(28)  has 
critical  path  delays  of  6.8  ns,  3.0  ns,  and  0.36  ns  respectively 
for  the  implementations  described  in  [3],  [4]  and  this  paper. 
Decoding  at  rates  exceeding  a  gigabyte  per  second  appears  to 
be  feasible  with  the  decoder  implementation  described  above. 
Details  of  the  proposed  architecture  can  be  found  in  [5]. 
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Abstract  —  We  consider  two  methods  of  Euclid’s 
algorithm  to  solve  the  Linear  Feedback  Shift  Regis¬ 
ter  (LFSR)  synthesis  problem.  One  of  the  methods 
is  identically  equivalent  to  the  celebrated  Berlekamp- 
Massey(B-M)  algorithm.  The  other  method  is  dis¬ 
tinctly  Euclidean.  The  formulation  of  the  problem 
from  Euclid’s  algorithm  leads  to  the  characterization 
of  the  LFSR  synthesis  for  the  reverse  sequence  given 
the  LFSR  synthesis  for  the  forward  sequence. 

I.  LFSR  Synthesis  problem 

All  polynomials  and  sequences  considered  in  this  paper  are 
over  a  finite  field  F.  Let  deg(P)  denote  the  degree  of  the 
polynomial  P  and  Coeff{P ,  l)  denote  the  coefficient  of  xl  in 
P.  Let  SN  =  {so,  si,  •  •  • ,  sn-i)  denote  a  sequence  of  length 
N  over  F.  The  LFSR  synthesis  problem  is,  given  SN  find  a 
shortest  length  LFSR  ( L )  satisfying  the  recursion: 

L- 1 

sj  =  -Y/Cisj-L+i,N<j<L,c,€F.  (1) 

i=0 

A  Algorithm  A 

Here  we  represent  the  sequence  SN  as  Sn-i  +  sn-^x  + 
...-I-  sixN~ 2  +  soxN~1  and  compute  recursively  using  the 
GCD  algorithm  the  minimal  connection  polynomial  C(x)  — 
co  +  cix-i - \-cl-\xl~1  +xl,  where  c^s  are  as  in  (1).  We  de¬ 

note  the  LFSR  of  length  L  by  a  polynomial  pair  ( C(x ),  B{x)), 
where  B(x)  =  S(x)C(x)  ( mod  xN).  This  is  the  version  of 
Euclid’s  algorithm  used  by  Dornstetter  in  [1]  to  prove  the 
equivalence  with  the  B-M  algorithm. 

B  Algorithm-B 

Let  a  sequence  S(-N)  be  represented  by  the  polynomial  S(x)  = 

so  +  six  -I - 1-  sn- iXN-1,  and  the  corresponding  connection 

polynomial  D(x)  is  given  by  D(x)  =  1  4-  Cl-\x  + - h  cox11, 

where  cjs  are  as  in  (1).  Note  that  the  the  polynomials  S(x) 
and  D(x)  given  above  happen  to  be  the  reciprocal  polynomials 
of  the  corresponding  polynomials  defined  in  Algorithm-A.  The 
following  theorem  supports  the  LFSR  synthesis. 

Theorem  1  An  LFSR  of  lengthg  L  with  a  connection  polyno¬ 
mial  D(x)  of  degree  L  generates  SN  if  and  only  if  there  exists 
a  polynomial  B(x)  such  that 

B(x)  —  S(x)D(x)  (mod  xN),deg(B(x))  <  L,Coef  f(D,  0)  =  1. 
_  (2) 

1This  work  was  supported  by  Australian  Research  Council  Large 
Grant  #449701206 


We  denote  the  LFSR  of  length  L  by  a  polynomial  pair 
[D(x),B(x)]  in  this  representation.  The  above  theorem  in 
conjunction  with  the  Euclid’s  GCD  evaluation  of  the  polyno¬ 
mials  S(x)  and  xN ,  results  in  Algorithm  B.  At  each  iteration 
the  algorithm  provides  a  valid  LFSR  representation  of  (2)  with 
the  length  max{deg(D(x)),  1  +  deg(B(x)}.  A  minimal  solu¬ 
tion  is  chosen  when  length  of  the  LFSR  is  minimal. 

Remark:  In  Algorithm-B,  the  length  of  the  LFSR  at  each  step 
monotonically  decreases  to  the  minimum  value  (  L°  =  N).  On 
the  other  hand,  in  Algorithm- A  which  resembles  the  B-M  al¬ 
gorithm,  the  length  of  the  LFSR  at  each  iteration  gradually 
increases  to  the  minimal  value  of  L  from  0. 

Next  theorem  characterizes  the  length  of  the  LFSR  design 
for  the  reverse  sequence. 

Theorem  2  Let  us  consider  the  shortest  LFSR 
(C(2m) ,  B(2m))  of  length  L(2m)  that  generates  the  sequence 
S(2m)  =  {5i,«2,...aJm}f  Let  (C(2m),S(2m))  be  the  shortest 
LFSR  for  reverse  sequence  S(2m)  =  (s2m,  «2m-i,  •  •  •  si)  of 
length  L(2m).  //L(2m)  <  m,  then 
L< 2m>  =  L(2m)  ifCoeff(C2m,  0)  #  0; 
lS 2m)  =  2m  —  lS2m'>  +  1  otherwise. 

If  L(2m)  >  m,  then 

i(2m)  =  L(2m)  if  Coef  f(C2m  ,0)  #  0; 

L<  2m>  =  2m  —  l/2m'  +  1  otherwise. 

The  above  Theorem  characterizes  completely  the  length  of 
the  LFSR  for  the  reverse  sequence  given  the.  design  of  the  for¬ 
ward  sequence.  This  generalizes  a  result  in  [2]  which  considers 
the  length  of  the  reverse  sequence  only  for  a  particular  case 
of  sequences  whose  complexity  is  exactly  half  of  the  sequence 
length  (L(2m)  =  m).  Also  observe  that,  in  our  approach,  the 
designs  for  the  reverse  sequence  can  be  obtained  by  Euclid’s 
algorithm. 

Even  though  the  similarities  between  the  B-M  and  other 
algorithms  are  studied  extensively,  this  view  point  of  the  pa¬ 
per  concerning  the  LFSR  synthesis  procedures  using  Euclid’s 
algorithm  seems  to  be  new. 
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Abstract  —  When  a  Hermitian  curve  is  put  in  a 
special  position  with  respect  to  its  infinite  rational 
point,  the  (x,  y)- coordinates  of  all  finite  rational  points 
of  the  Hermitian  curve  have  regular  algebraic  prop¬ 
erties.  Based  on  these  properties,  the  use  of  Horner’s 
rule  and  the  mechanism  of  Chien  search  in  the  decod¬ 
ing  of  Reed-Solomon  codes  can  be  extended  to  render 
up  efficient  architectures  for  syndrome  generation  and 
error  location  search  in  the  decoding  of  codes  con¬ 
structed  from  the  Hermitian  curve.  [1,  2,  3] 

I.  Introduction 

It  is  well  known  that  syndrome  generation  in  the  decoding 
of  Reed-Solomon  codes  is  usually  executed  by  Homer’s  rule 
which  has  regular  hardware  structure  and  is  suitable  for  VLSI 
implementation.  The  first  goal  of  this  paper  is  to  extend  the 
use  of  Homer’s  rule  to  the  decoding  of  Hermitian  codes.  The 
intuitive  idea  of  error-locator  searching  is  to  evaluate  the  value 
of  the  error-locator  polynomial  at  each  finite  rational  point  of 
the  Hermitian  curve.  If  the  value  is  zero,  then  an  error  location 
is  found.  It  will  be  welcome  to  have  an  architecture  like  the 
mechanism  of  Chien  search,  which  generates  all  finite  rational 
points  of  the  Hermitian  curve  and  evaluates  the  error-locator 
polynomial  at  these  points  very  efficiently,  which  is  the  second 
goal  of  this  paper. 

II.  Rational  Points  in  a  Hermitian  Curve 

Let  Et  =  {oo,  0, 1,  •  •  •  ,  r  —  2}  be  a  linearly  ordered  set  with 
oo<0<l<”*<r-2.  Assume  that  a  is  a  primitive 
element  of  GF(q 2)  and  /?  =  a,+1.  For  convenience,  we  define 
a°°  =  0.  The  set  fi  of  (x,  y)-coordinates  of  all  finite  rational 
points  of  the  Hermitian  curve  ‘Hq  over  GF(q s),  defined  by 
xv+1  =  yq  +  y,  can  be  shown  to  be  H  =  lime®/*"1  x  y™)> 
where 

y  _/{°}»  if  m  =  oo, 

m  \{am+*(«-1>jfc  =  0,l,...,g},  ifm6[0,9  — 2],  W 

and  Ym  is  the  solution  set  of  the  equation  yq  +  y  =  /?m,Vm  6 
Eq.  Define  to  be  the  intersection  of  the  Hermitian  curve 
with  the  line  y  =  a1  on  the  affine  plane  over  GF(q2)  and  A< 
be  m  if  a‘  is  in  Ym  for  each  *  €  Eq a.  Now,  we  are  able  to 
identify  a  point  a*)  in  the  y-slice  by  the  pair 

(k,»),  denote  this  point  as  P(k.<)  for  each  *  in  Eqa ,  and  order 
the  points  P(k,i)  in  B  according  to  a  lexicographical  ordering 
on  the  index  pairs  (k,i)  :  ( k',i ')  <i  ( k,i )  if  t'  <  *  or  if  i'  =  t 
and  k'  <  k,  where  oo<0<l<---<g2-2. 

_ III.  Syndrome  Generation 

1This  work  was  supported  by  the  National  Science  Council,  Tai¬ 
wan,  under  contract  no.  NSC86-2221-E-007-022.  The  authors  are 
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University,  Hsinchu  30065,  Taiwan,  email:  cclu@ee.nthu.edu.tw. 


For  our  purposes,  let  S(a,b)  be  the  syndrome  of  the  Hermi- 
tian  code  Hm  associated  with  the  monomial  xayb  for  a,  6  >  0 
and  aq  +  b(q  +  1)  <  m.  It  can  be  shown  that  S(a,  b)  = 

'Lp^nrPx(p)ay(p)b  =  E?^aCad(“‘)S  where  {rp}Pen  is 
the  received  symbol  sequence  and  c0,»  =  J2Pen(y)  rpx(P)a 

for  each  t  6  Eq a.  If  Ai  =  oo,  we  have  c0,<  =  0  and  if 
A<  ^  oo,  Ca,i  =  aoAi  E*=or*.<(“o(,-1))‘.  where  we  denote 
rp  as  r*,i  when  the  two-dimensional  index  of  P  is  (Jfc,  i).  Thus 
the  generation  of  the  syndrome  S(a,b)  can  be  implemented 
by  a  Homer’s  double-loop  with  feedback  gains  ao(,-1)  and  ab 
respectively. 

IV.  Error  Location  Search 

Here  we  re-index  the  q3  finite  rational  points  through 
a  unique  order-preserving  transformation  T  from  the  two- 
dimensional  index  set  onto  (Eq s,<).  Let  <r(x,y)  = 

E"=i  <Ttxafyb'  be  an  error-locator  polynomial  with  ai  =bi  — 
0.  Since  the  first  point  Poo  has  coordinates  (0,0),  we  have 
it  (Poo)  —  cri  and  then  Poo  is  an  error  location  if  and  only  if 
0-1=0.  At  all  other  finite  rational  points  Pj,0  <  j  <  q3  -  2, 
we  have  o(P,)  =  £”=i  Ft(Pj)  =  E?=i  L°opi(j )  •  Selectt(j). 
The  sequence  {Selectt(j)}0<j<q3~2  can  be  implemented  by 
a  multiplier-selection  unit  with  a  control  signed  emd  the  se¬ 
quence  {Loqpe(j)}0<j<9 3_2  can  be  generated  sequentially  like 
in  the  mechetnism  of  Chien  search  by  a  closed-loop  with  initial 
value  at  and  with  three  feedback  gains  aat^9~1\ 
and  abl ,  selected  by  a  multiplexer  with  a  control  signal. 

V.  Conclusion 

In  this  paper,  we  exploit  a  specific  ordering  of  all  finite 
rational  points  of  a  Hermitian  curve  and  extend  the  use  of 
Homer’s  rule  and  the  mechanism  of  Chien  search  in  the  de¬ 
coding  of  Reed-Solomon  codes  to  the  decoding  of  Hermitian 
codes.  The  basic  building  blocks  in  the  proposed  architectures 
are  loops  and  multiplier-selection  units.  Since  the  multipliers 
used  are  all  special-purposed,  the  hardware  complexity  is  very 
acceptable.  Moreover,  the  number  of  clock  times  needed  to 
complete  a  cycle  of  syndrome  generation  or  a  cycle  of  error- 
location  search  is  just  the  length  of  a  received  codeword.  In 
conclusion,  the  two  goals  of  this  paper,  stated  in  the  introduc¬ 
tion  section,  are  achieved. 
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I.  Description  and  performance  results 
The  turbo-principle  [Hag97]  is  far  more  reaching  than  the 
original  turbo  code  concept.  Encoded  DPSK  transmission, 
e.g.,  can  be  viewed  as  a  serial  concatenation  of  a  channel 
code  with  a  rate-one  code  representing  the  DPSK  modula¬ 
tion.  Iterative  decoding  of  this  concatenated  scheme  shows 
surprisingly  good  performance  (see  [PSH97,  Hoe99j).  We  de¬ 
scribe  the  serial  concatenation  of  interleaved  tail-biting  convo¬ 
lutional  codes  (TBCC)  with  this  DPSK  rate-one  code  trans¬ 
mitted  over  AWGN  and  flat  Rayleigh  fading  channels.  The 
receiver  performs  time-continuous  “turbo”  iterations  between 
the  inner  and  outer  codes  and  is  realized  by  two  analog  ring 
networks  connected  via  an  interleaver  ring  which  exchanges 
extrinsic  information  by  means  of  analog  signals  being  contin¬ 
uous  in  time  and  value.  Employing  analog  circuits  is  advan¬ 
tageous,  since  they  are  much  faster  and  consume  significantly 
less  power.  The  feasibility  of  analog  circuits  replacing  the 
Viterbi  or  BCJR  algorithm  was  shown  in  [Hag98],  [HOM99] 
and  [Loel98].  In  the  meantime  the  first  VLSI  chips  for  simple 
decoders  have  been  produced  at  Lucent/TU  Munich  [MGYOO] 
and  Endora/ETH  Zurich. 


ip.) 


Figure  1:  Structure  of  the  analog  DECPSK  receiver  corre¬ 
sponding  to  one  trellis  section. 

The  key  element  of  the  analog  networks  is  the  represen¬ 
tation  of  the  log-likelihood  ratio  (LLR)  of  a  binary  random 
variable  L(x)  =  log(P(x  =  +l)/P(x  =  —1))  by  voltages.  The 
’box-plus’  element  EB  defined  by 

L(xi)  EB  L(x2)  =  L{x i©X2) 

=  2  atanh(tanh(L(xi)/2)  ■  tanh(L(x2)/2)) 

can  then  be  realized  by  a  simple  circuit  consisting  of  9 
transistors  [MGYOO].  Using  these  elements  plus  summations 
we  design  an  analog  turbo  receiver  for  the  concatenation  of 
DECPSK  modulation  and  a  memory-2  TBCC.  One  section 


of  the  DECPSK  receiver  (processing  one  trellis  section) 
is  shown  in  Fig.  1.  Note  that  this  results  in  a  delay-less 
nonlinear  bidirectional  circuit  connected  to  the  receiver 
values,  the  neighboring  circuits  and  the  interleaver  ring.  The 
channel  provides  us  with  the  weighted  matched  filter  outputs 
L(x.)  and  the  interleaver  ring  with  the  extrinsic  LLRs  L(bi) 
from  the  decoder  circuit  of  the  TBCC.  Simulation  results 
are  obtained  by  solving  nonlinear  differential  equations. 
Fig.  2  shows  that  the  analog  network  performs  similar  to  a 
digital  turbo  decoder  applying  20  iterations:  In  addition,  we 
discuss  implementation  issues  of  analog  VLSI  showing  their 
advantages  in  terms  of  power  consumption  and  speed. 


Figure  2:  Bit  error  rate  versus  signal-to-noise  ratio  for  de¬ 
coding  of  DECPSK  serially  concatenated  with  tail-biting  con¬ 
volutional  codes  applying  a  digital  iterative  decoder  and  an 
analog  decoder. 
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Abstract  —  Several  popular,  suboptimal  algorithms 
for  bit  decoding  of  binary  block  codes  such  as  turbo 
decoding,  threshold  decoding,  and  message  passing 
for  LDPC,  were  developed  almost  as  a  common  sense 
approach  to  decoding  of  some  specially  designed 
codes.  We  explain  exactly  how  they  approximate  the 
optimal  decoding  algorithm,  and  show  how  good  this 
approximation  is  in  some  special  cases. 

We  propose  an  entirely  new  approach  to  the  problem  of 
iterative  decoding,  which  is  algebraic  in  nature  and  derives 
the  well  known  suboptimal  algorithms  from  the  bit-optimal 
as  a  starting  point.  This  approach  gives  new  insights  into  the 
issues  of  iterative  decoding. 

We  are  concerned  with  a  binary  block  code  C  defined  by 
its  parity-check  matrix  H  =  {/iij}(„_fc)xn,  *-e.,  by  the  group 
generators  hi  =  {hy}ixn,  t  €  X,  of  its  dual  code  C',  where  X 
is  used  to  denote  the  index  set  X  =  {0, 1, . . . ,  n  —  k  —  1}.  We 
consider  suboptimal  decoding  algorithms  for  a  binary  code 
C,  e.g.,  a  turbo  decoding  scheme  (as  introduced  in  [1],  [2]) 
with  two  component  codes  C i  and  C-t  defined  by  their  dual 
codes’  sets  of  generators  X\  and  Xi  such  that  X\  fl  Xi  =  0  and 
X\  D  Xi  —  X. 

The  channel  is  assumed  to  be  memoryless  and  codewords 
equiprobable.  We  derive  an  expression  for  optimal  decoding 
of  the  whole  code  C  based  on  the  dual  code  (as  in  [3]),  and 
then  rewrite  it  so  that  it  explicitly  involves  only  Thus, 
for  the  log-likelihood  of  bit  m  over  code  C,  =  log[P(cm  = 
0|r)/P(cm  =  l|r)],  we  obtain  the  following: 

itbiiKi 

1.x  iex  i= 0 

Lcm=  log  -flog - - -  (!) 

iez  j—o 

jjtm 

where  A j  =  [pfa  1°)  —  1 1)]/[p(ry  1°)  +p(^|l)]  and 

A<  ®  Aj  =  (A<  •  A7)1-4ii  =  | 

We  then  show  that  the  turbo  decoding  algorithm  can  be 
represented  as  shown  in  the  figure  below: 

1This  work  was  supported  by  the  1999  German- American  Net¬ 
working  Research  Grant  given  by  the  the  national  academies  of 
engineering  of  Germany  and  the  USA. 
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The  corresponding  expression  for  the  log-likelihood  of  bit  tn 
of  the  turbo  decoding  algorithm  at  iteration  (1/  +  1)  given  by 


[LCmUr=  logj-ti^k  +  log 

1  [Am  \v 


n.biw] 

i£Z 1  i=0 

_  i^rn _ 

n. 


is  then  compared  to  the  optimal  solution  (1). 

A  similar  analysis  can  be  done  for  Gallager’s  message  pass¬ 
ing  algorithms  for  his  LDPC  codes  as  well  as  for  threshold 
decoding. 
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Abstract  —  An  efficient  algorithm  for  the  MAP  de¬ 
coding  is  presented.  The  proposed  algorithm  is  a 
hybrid  of  the  conventional  BCJR  algorithm  and  the 
recursive-MAP  (r-MAP)  algorithm  which  has  been 
proposed  by  the  authors.  The  r-MAP  algorithm  uses 
structural  properties  of  linear  codes  to  reduce  the  de¬ 
coding  complexity,  but  does  not  work  well  for  high- 
rate  codes.  The  proposed  algorithm  overcomes  this 
defect,  and  achieves  smaller  decoding  complexity  than 
BCJR  and  r-MAP  algorithms  for  any  code. 

I.  Background 

The  maximum  a  posteriori  (MAP)  decoding  plays  an  essen¬ 
tial  role  in  the  decoding  of  turbo  codes  and  the  complexity 
of  a  MAP  decoding  algorithm  is  significant  for  realization  of 
efficient  turbo  decoders.  The  BCJR  algorithm M  js  known  as 
the  conventional  algorithm  for  the  MAP  decoding,  but  the 
complexity  of  the  BCJR  algorithm  is  too  much  for  construct¬ 
ing  the  entire  trellis  diagram  of  the  code.  The  authors  have 
proposed  the  recursive-MAP  (r-MAP)  algorithm  in  [3],  The 
r-MAP  algorithm  uses  structural  properties  of  linear  codes, 
and  succeeds  in  reducing  (time  and  space)  complexity  for  the 
MAP  decoding  for  relatively  low-rate  codes.  However,  the  r- 
MAP  algorithm  is  inefficient  for  high-rate  codes.  Liu  et  al. 
consider  to  apply  the  BCJR  algorithm  on  a  section  trellis  di¬ 
agram ,  and  show  that  the  decoding  complexity  is  reduced^2!. 

II.  Proposed  Algorithm 

Let  C  be  an  (n,  k)  binary  linear  code,  and  let  Cxy  be  the  set 
of  codewords  of  C  such  that  the  first  x  and  the  last  n  —  y 
symbols  are  all  zero.  Also  let  pxy(C)  be  a  set  of  vectors  which 
are  obtained  by  deleting  the  first  x  and  the  last  n  —  y  symbols 
of  each  codeword  of  C.  Let  Lxy  be  the  set  of  cosets  of  Pzy(Cxy) 
in  Piy(C),  and  define  D'Jy  with  Dxy  £  Lxy,  x  <  i  <  y  and 
b  £  {0, 1}  as  the  set  of  vectors  in  Dxy  such  that  the  ( i  —  x)-th 
symbol  of  the  vector  is  b.  Define 

MAP(DI!m  i,  6)  =  Y,  II  Pu  K  )'Pr(rjK  ) 

veD'JJ’  *<i<v 

for  x  <  i  <  y  and  b  £  {0,1}  where  Prj(uj)  is  the  a  pri¬ 
ori  probability  that  the  symbol  vj  is  chosen  at  the  j-th  bit 
position.  If  the  j-th  bit  position  is  a  parity  symbol,  then 
Prj(u,)=l.  The  MAP  table  for  Dxy  £  Lxy  is  a  table  which 
contains  MAP(jDxy ,  i,  b)  for  x  <  i  <  y  and  b  £  {0, 1}.  In  the 
r-MAP  algorithm,  the  MAP  tables  are  constructed  in  a  divide- 
and-conquer  manner:  For  short  sections  (i.e.  y  —  x  is  small), 
the  MAP  tables  are  constructed  in  a  rather  straight-forward 
way.  Otherwise,  the  MAP  tables  are  computed  recursively, 
by  decomposing  the  coset  Dxy  into  cosets  in  Lxz  and  Lzy, 
where  x  <  z  <  y,  computing  the  (smaller)  MAP  tables  for  the 


decomposed  cosets,  and  combining  the  computed  (smaller) 
MAP  tables.  By  this  approach,  the  decoding  complexity  is 
reduced  significantly  for  low-rate  codes.  However,  for  high- 
rate  codes,  the  complexity  of  the  r-MAP  algorithm  is  larger 
than  the  BCJR  algorithm.  Careful  analysis  of  the  complexity 
of  the  r-MAP  algorithm  shows  that  the  complexity  necessary 
at  the  recursion  levels  two  or  three  is  considerably  large. 

We  consider  a  hybrid  algorithm  of  the  r-MAP  and  the 
BCJR  algorithms.  In  the  proposed  hybrid  algorithm,  MAP 
tables  are  constructed  in  the  bottom-up  manner,  as  in  the 
r-MAP  algorithm.  When  MAP  tables  for  reasonable  section 
length  are  built  up,  we  switch  to  the  BCJR  algorithm.  A  sec¬ 
tion  trellis  diagram  with  appropriate  section  boundaries  are 
considered,  and  MAP  tables  are  associated  with  the  composite 
branches  of  the  section  trellis.  Then,  a  BCJR-like  algorithm 
is  executed  to  obtain  the  MAP  table  for  the  code  C. 

III.  Evaluation 

The  decoding  complexity  of  the  proposed  algorithm  depends 
on  the  section  boundaries  at  which  algorithms  are  switched. 
The  BCJR  and  the  (pure)  r-MAP  algorithms  can  be  regarded 
as  special  cases  of  the  proposed  algorithm.  Therefore,  the 
decoding  complexity  of  the  proposed  algorithm  cannot  be 
wronger  than  the  complexity  of  the  BCJR  and  r-MAP  algo¬ 
rithms.  Table  1  is  to  compare  the  decoding  complexity  of  the 
BCJR,  r-MAP  and  the  proposed  algorithms  with  the  known 
best  sectionalization.  The  table  shows  the  number  of  mul¬ 
tiplications  of  probabilities  necessary  for  one  decoding.  The 
proposed  algorithm  is  more  efficient  than  the  other  algorithms. 

As  a  future  work,  a  systematic  way  for  finding  the  optimum 
sectionalization  must  be  investigated. 


Table  1:  The  decoding  complexity. 


code* 

BCJR 

r-MAP 

proposed 

RM(64,22) 

RM(64,42) 

1,500,132 

2,197,476 

174,464 

4,492,672 

116,096 

529,792 

eBCH(64,16)j, 

eBCH(64,36)c 

eBCH(64,45)r 

2,860,004 

56,098,788 

3,000,292 

120,192 

105,056,640 

15,525,680 

120,192 

20,670,592 

1,762,528 

RM(n,  fc)  and  eBCH(n,  stand  for  Reed-Muller  code  and 
extended  BCH  code  with  p-type  permutation,  respectively. 
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Abstract  —  Fundamental  building  blocks  for  ana¬ 
log  decoders  are  introduced  which  transform  log- 
likelihood  ratios  into  probabilities  and  vice  versa.  By 
interconnecting  these  blocks  general  trellis  modules 
can  be  designed  for  an  analog  VLSI  implementation 
of  the  APP  (BCJR)  decoding  algorithm. 

I.  Introduction 

Analog  Viterbi  decoders  are  already  employed  in  magnetic 
recording.  Recently,  analog  VLSI  implementations  of  APP 
decoders  were  reported  in  [1],  [2].  For  more  background  in¬ 
formation  see  references  therein.  Such  decoders  are  needed 
for  complex  and  time-consuming  ‘turbo’-decoding  and  ‘turbo- 
equalization.  The  main  advantages  of  an  analog  implementa¬ 
tion  include  higher  speed,  smaller  size  and  lower  power  con¬ 
sumption  when  compared  to  a  digital  implementation. 


II.  Log-Likelihood  Ratios  and  Probabilities 

Consider  a  discrete  random  variable  X  with  values  x  6 
{0, . . . ,  J—  1}.  The  log-likelihood  of  the  probability  Px  ( x  =  j) 
is  defined  as  lj(X)  =  \n(Px(x  =  j))  by  using  the  natural  log¬ 
arithm.  A  log-likelihood  ratio  of  the  discrete  random  variable 
X  can  be  expressed  by  using  two  outcomes  x  =  i  and  x  =  j 

L*,j(X)  =  h(X)  -  lj  (X)  =  In  '  (!) 


The  probability  of  a  possible  outcome  x  =  j  is  determined  by 


Px{x  =  j)  =  —j 


V  e-LitU(x) 
Z_<i/=o 


III.  Analog  Decoder  Implementation 

Elementary  devices  of  an  analog  decoder  are  bipolar  transis¬ 
tors  and  diodes,  which  realize  the  exponential  and  logarith¬ 
mic  functions,  respectively.  The  collector  current  Ic  of  a 
bipolar  transistor  is  a  function  of  the  base  emitter  voltage 
Vbe  with  Ic  =  Is  ev°E^v'r ,  where  Is  denotes  the  transport 
saturation  current  and  Vt  is  a  temperature  dependent  quan¬ 
tity  («  26  mV  at  300°  K).  The  configuration  of  the  emitter 
coupled  transistors  shown  in  Fig.  1  forms  a  Type  II  block. 
This  block  is  an  exact  circuit  implementation  of  (2),  where 
the  input  voltages  Vj  =  Vt  lj(X)  +  C  are  transformed  into 
output  currents  Ij  =  I  Px(x  =  j).  Here  I  is  the  bias  cur¬ 
rent  of  the  circuit  and  C  is  a  voltage  constant  to  be  cho¬ 
sen  according  to  the  required  input  voltage  range  of  the  cir¬ 
cuit.  Log-likelihood  ratios  are  represented  by  differential  in¬ 
put  voltages  Vij  —  Vi  —  Vj  —  Vt  Llj(X).  The  probability 


1  This  work  was  supported  by  the  Wireless  Circuits  and  Systems 
Research  Department,  Bell  Labs,  Lucent  Technologies,  600  Moun¬ 
tain  Avenue,  Murray  Hill,  NJ  07974. 


Fig.  1:  Type  I  block:  Diode  connected  transistors  transforming 
probabilities  into  log-likelihood  ratios,  Type  II  block:  Emitter  cou¬ 
pled  devices  transforming  log-likelihood  ratios  into  probabilities, 
Example:  Trellis  module  for  a  binary  trellis  section  (XOR  opera¬ 
tion  of  three  bits). 

multiplication  of  the  APP  decoding  algorithm  can  be  imple¬ 
mented  by  using  a  stacked  configuration  of  Type  II  blocks, 
where  output  currents  of  . one  lower  block  (J  =  N)  are  used 
as  bias  currents  for  N  upper  blocks  (J  =  M).  Assuming  the 
lower  Type  II  block  for  a  variable  X\  and  each  upper  Type 
II  block  for  a  variable  Xi  the  output  currents  of  the  upper 
Type  II  blocks  represent  NM  probability  products  Pxx  Px2  ■ 
The  probability  summation  of  the  APP  decoding  algorithm 
is  simply  obtained  by  connecting  the  current  outputs  of  the 
blocks  together.  This  technique  is  used  in  [1]  to  design  trellis 
modules  with  current  inputs  and  in  [2]  for  modules  with  volt¬ 
age  inputs.  The  inversion  of  the  exponential  characteristic 
can  be  obtained  by  using  diode  connected  transistors  which 
form  the  Type  I  block  in  Fig.  1.  Such  devices  act  as  simple 
diodes  and  generate  voltage  drops  proportional  to  the  loga¬ 
rithm  of  the  input  currents.  When  the  input  currents  repre¬ 
sent  (scaled)  probabilities  this  circuit  exactly  implements  (1) 
with  —V'i,j  =  Vjti  =  Vt  Lij(X).  The  negative  sign  is  due 
to  all  V'j  which  are  voltages  to  ground  rather  than  voltage 
drops  at  the  diodes.  Note  that  any  scaling  of  probabilities 
cancels  out  in  (1).  The  trellis  modules  in  [2]  use  such  Type  I 
blocks  on  top  of  all  Type  II  blocks  to  transform  the  (scaled) 
probabilities  back  into  the  log-likelihood  domain  while  in  [1] 
CMOS  current  mirrors  are  used  to  generate  output  currents 
carrying  the  probability  information.  For  the  binary  case  with 
N,M  =  2  and  a  single  binary  trellis  section  (see  Example  in 
Fig.  1)  the  circuit  implementation  results  in  a  Gilbert  cell  with 
diode  loads  [2],  where  the  overall  function  can  be  described  by 

V?.o  =  2  Vt  tanh-1  [tanh  tanh  (%^)] . 
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Abstract  —  We  prove  a  sufficient  and  a  necessary 
condition  on  the  existence  of  fix-free  codes,  and  give 
a  few  new  upper  bounds  on  the  redundancy  of  optimal 
fix-free  codes. 


I.  Introduction 

In  fix-free  codes,  no  codeword  is  a  prefix  or  a  suffix  of  any  other 
codeword.  This  kind  of  codes  has  several  applications  [1].  For 
example,  a  file  compressed  by  fix-free  codes  can  be  decoded 
in  the  forward  direction  and  the  reverse  direction  simultane¬ 
ously,  thus  reducing  the  decoding  time  to  half  compared  with 
decoding  in  one  direction  only. 


II.  Main  results 


Consider  a  code  with  n  codewords.  The  lengths  of  these  code¬ 
words  form  a  vector  (Ii ,  •  •  • ,  lk,  ■  ■  ■ ,  ln),  where  lk  is  the  length 
of  the  A:th  codeword.  We  assume  without  loss  of  generality 
that  l\  <  I2  <  •  •  •  <  ln,  and  use  ln  to  denote  this  ordered  set 
of  codeword  lengths.  Let  h(i)  be  the  smallest  h *  such  that 
bi*  =  b+i  ■  Define  (a:)+  as  the  positive  part  of  a  real  number 


x,  i.e., 


if  x  >  0, 
if  x  <  0. 


Our  main  results  in  [2]  are  summarized  below. 


Theorem  1  (Sufficient  Condition)  For  a  set  of  codeword 
lengths  ln ,  define 


n  —  1 

»/(£)  =  n^E  2~h + [* + 1  -  •  2_'i+i 

1=1  i<j<* 

+  y  2~b -'<■)+. 

U  +  'k^L'  +  l 


If  si(ln)  >  0,  there  exists  a  fix-free  code  with  lengths  l„. 


Corollary  3  Let  ln  be  a  set  of  codeword  lengths.  If  l\  =  1, 
then 

E  s  I 

implies  the  existence  of  a  fix-free  code  with  lengths  T%. 


Theorem  2  (Necessary  Condition)  For  a  set  of  codeword 
lengths  ln,  define 


Su(C) 

n—1 

t=i  i<i<t 

4  Y^  2^i+1_b'_,k)+-b+i^  + 

„  .  ....  . 

If  su(ln)  =  0,  no  fix- free  code  with  lengths  Tn  exists. 

Theorem  3  For  a  set  of  codeword  lengths  it,  if  for  1  <  i  < 
n—1,  either  li  =  Ijj-i  or  2/,-  <  /,-+i ,  then  there  exists  a  fix-free 
code  with  lengths  ln  if  and  only  if 

n  —  1 

JJ(1  -  2  Y,  2_ij'  +  [*  +  1  -  Hi))  ■  2"'i+1 

»= 1  i<i<* 

+  Y  2_'»_'k)+>0 


Let  q,  p  1  and  pn  be  the  probability  of  any  given  source 
symbol,  the  probability  of  the  most  likely  source  symbol  and 
the  probability  of  the  least  likely  source  symbol,  respectively. 
Denote  R  as  the  redundancy  of  an  optimal  fix-free  code. 
Theorem  4 

[  2-Hi,(5)-(l-g)log(l-2-r-1“^l) 

R<<  -9(1-  f-loggl),  if  9  <  0.5, 

[  4  —  3g  -  Hs(q),  if  g  >  0.5. 

Corollary  4  For  any  fixed  n,  where  n  is  the  size  of  the  source 
alphabet,  it  is  impossible  to  construct  a  sequence  of  source  dis¬ 
tributions  for  which  R  tends  to  2. 


Corollary  1  Let  ln  be  a  set  of  codeword  lengths.  If 

Y  2~1’  <-  +  n  +  2~Hn-I)  2-(„ 

1  <  j  <  n 


then  there  exists  a  fix- free  code  with  lengths  in- 
Corollary  2  Let  be  a  set  of  codeword  lengths.  If 

E  s  5. 

1  <j<n 

then  there  exists  a  fix- free  code  with  lengths  it. 


Theorem  5 

R  <  min[i-3pi  -  Hb(pi),2- Hb{pi) 

-(1  -Pi)log(l  -  2~^~  loKpC)  _  px  (1  _  r-logp,!)]. 

Theorem  6 

R  <  2  —  Hb(pn)  —  (1  —  pn)  log(l  —  pn  +  2~i~ log P’C ) 

“Pn(l  ~  r-logPnl)- 

References 

[1]  J.  L.  Peterson,  “Computer  programs  for  detecting  and  correct¬ 
ing  spelling  errors,”  Comm.  ACM,  23,  pp.  676-687,  1980. 

[2]  C.  Ye  and  R.  W.  Yeung,  “Some  basic  properties  of  fix-free 
codes,”  Accepted  by  IEEE  Trans.  Infoem.  Theory,. 


0-7803-5857-0/00/$!  0.00  ©2000  IEEE. 


426 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Variable-rate  Codes  for  Synchronization  with  Timing* 

Navin  Kashyap  and  David  L.  Neuhoff 
EECS  Dept.,  University  of  Michigan 
Ann  Arbor,  MI  48109-2122,  USA 
{nkashyap ,  neuhoff  }<3eecs .  umich .  edu 


Abstract  —  This  paper  proposes  and  analyzes  vari¬ 
able-rate  sync-timing  codes  that  resynchronize  after 
the  encoded  bits  are  corrupted  by  insertions,  deletions 
or  substitution  errors,  and  also  produce  estimates  of 
the  time  indices  of  the  decoded  data. 

I.  Introduction 

Information  theory,  which  has  traditionally  focused  on  en¬ 
coding  data  values,  has  apparently  overlooked  the  problem 
of  encoding  data  time  indices.  The  latter  is  necessary  in  most 
situations  where  conventional  data  synchronization  is  needed, 
i.e.,  when  the  encoded  bit  stream  is  corrupted  by  insertions, 
deletions  or  substitution  errors.  For  example,  suppose  an  infi¬ 
nite  sequence  of  temperatures,  43, 64, 27, 54, 36, 42,  73, 45, . . ., 
corresponding  to  cities  Det,  LA,  Chi,  Bos,  NY,  StL,  Mia, 
Bal,. . . ,  is  encoded  into  an  infinite  sequence  of  bits,  which 
a  decoder  must  begin  decoding  midstream,  at  some  arbitrary 
point.  Since  the  decoder  does  not  initially  know  how  to  parse 
the  arriving  bits  into  codewords,  it  will  ordinarily  produce  er¬ 
roneous  outputs,  until  at  some  point,  it  acquires  synchroniza¬ 
tion  and  produces  correct  outputs  from  then  on.  For  example, 
if  it  produces  73, 40,  54,  36,  42,  73,  45, . . .,  then  it  has  acquired 
sync  when  producing  “54” .  However,  the  decoded  data  is  of 
limited  value  because  the  correspondence  between  tempera¬ 
tures  and  cities  has  been  lost.  What  is  needed  is  a  system  that 
encodes  data  time  indices,  as  well  as  data  values,  so  that  the 
decoder  produces  estimates  of  both,  i.e.,  a  sequence  of  temper¬ 
ature  and  time  index  pairs,  like  (73, 12),  (40,  7),  (54, 4),  (36, 5), 
(42,  6),  (73,  7),  (45,  8), . . ..  We  refer  to  codes  that  encode  and 
decode  data  time  indices,  as  well  as  data  values,  as  sync-timing 
codes.  They  are  essential  in  video  coding,  where  they  ensure 
frame-sync,  as  well  as  audio- video  sync. 

While  conventional  sync  codes  have  been  much  studied 
(c/.  [1]),  only  ad  hoc  techniques  have  been  developed  for  sync¬ 
timing,  e.g.  the  marker  systems  in  JPEG  and  MPEG.  Indeed, 
the  sync-timing  problem  has  only  recently  been  identified  as 
such,  and  only  recently  has  a  theory  begun  to  emerge  [2]. 
However,  the  theory  in  [2]  applies  only  to  fixed-rate  schemes, 
whereas  source  codes  me  often  variable-rate,  and  such  codes 
are  the  most  sensitive  to  errors.  In  this  paper,  we  initiate 
a  theory  of  variable-rate  sync- timing  codes  by  introducing  a 
family  of  such  codes  and  analyzing  them  on  the  basis  of  the 
performance  measures  used  in  [2]:  coding  rate,  resynchroniza¬ 
tion  delay  and  timing  span  (which  measures  the  code’s  ability 
to  reproduce  time  indices).  We  find  that  the  asymptotic  per¬ 
formance  of  the  variable-rate  sync-timing  codes  studied  here 
is  the  same  as  that  of  the  best  known  fixed-rate  codes. 

II.  A  VARIABLE-RATE  SYNC-TIMING  CODE 

Here,  we  describe  a  variable-rate  sync-timing  encoder  that  is 
designed  to  follow  a  binary  source  encoder  that  maps  blocks 

•This  work  was  supported  by  NSF  Grant  NCR-9415754. 


of  k  source  symbols  into  codewords  with  average  length  kL. 
The  output  of  the  sync-timing  encoder,  in  response  to  a  source 
codeword,  is  a  sync-timing  codeword  that  consists  of  one  of  p 
distinct  markers,  followed  by  the  source  codeword  after  it  has 
undergone  bitstuffing,  followed  by  a  zero.  The  marker  prefixed 
to  the  jth  “bitstuffed”  codeword  comprises  a  flag  of  mi  consec¬ 
utive  ones  (denoted  by  lmi ),  followed  by  a  zero,  followed  by  a 
block  index  codeword  for  j  —  1  mod  p,  followed  by  another  zero. 
The  block  index  codewords  are  p  distinct  binary  sequences  of 
length  m2,  each  of  which  does  not  contain  the  flag.  Bitstuffing 
prevents  the  appearance  of  a  flag  in  each  source  codeword  by 
“stuffing”  a  zero  immediately  after  each  occurrence  of  lmi_1 
in  the  codeword,  the  idea  being  that  the  flag  can  then  be  used 
for  synchronization  purposes.  The  structure  of  this  variable- 
rate  sync-timing  code  is  similar  to  that  of  the  cascaded  code 
described  in  [2].  Note  that  the  source  encoder  could  be  lossless 
or  lossy,  and  it  need  not  process  blocks  independently. 

The  decoder  locates  the  flags  in  the  stream  of  received  bits, 
and  for  each  flag  found,  it  reverses  (if  possible)  the  encod¬ 
ing  procedure  on  the  sequence  up  to  the  next  flag.  Thus, 
every  successful  reversal  of  the  encoding  yields  the  integer 
j  encoded  by  the  block  index  codeword,  and  the  reproduc¬ 
tions  of  the  k  source  symbols  encoded  by  the  source  codeword. 
The  ith  source  symbol  decoded  receives  the  time  index  jk  +  i. 
Since  j  <  p  —  1,  the  time  indices  produced  by  the  decoder  are 
modulo-fcp  reductions  of  the  actual  time  indices  of  the  data. 

We  measure  the  performance  of  this  code  in  terms  of  delay, 
rate  and  timing  span,  as  defined  in  [2],  after  modifying  their 
definitions  slightly  to  account  for  the  variable-rate  nature  of 
this  code.  Thus,  we  define  resynchronization  delay  to  be  the 
average  length  of  a  sync-timing  codeword,  so  that  D  =  mi  + 
m2  +  k(L  +  S)  +  3,  where  kS  is  the  average  number  of  stuffed 
bits  added  per  codeword.  Rate,  R  =  kL/D ,  is  a  measure  of  the 
redundancy  introduced  by  the  sync-timing  encoder.  Finally, 
the  timing  span  of  the  code,  T  =  kp,  is  a  measure  of  the  code’s 
ability  to  reproduce  time  indices.  We  want  D  to  be  small,  R 
to  be  close  to  1,  and  T  to  be  large.  Let  T(r,d)  denote  the 
maximum  timing  span  achievable  by  such  a  code  with  rate  at 
least  r  and  delay  at  most  d.  The  following  theorem  shows  that 
the  rate  of  growth  of  T(r,  d)  with  d  has  the  same  asymptotic 
form  as  that  for  fixed-rate  codes  (c/.  [2]). 

Theorem:  For  r  S  (0,1),  (i)  for  any  d  >  0,  T(r,d)  < 
(d/L)2d(-1~r\  (ii)  for  any  e  >  0,  there _exists  d(r,e)  >  0  such 
that  for  all  d  >  d(r,e),  T(r,d)  >  rd/(L(  1  +  e))  2d(1_rU(1+e), 
and  (iii)  lim<j_ioo(log2T(r,d))/d  =  1  —  r. 
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Abstract  —  We  examine  a  specific  strategy  of  us¬ 
ing  sync  markers  for  error  containment  in  compressed 
data,  using  a  model  that  separates  the  data  compres¬ 
sion  and  error  containment  stages. 

I.  A  Simple  Error  Containment  Scheme 

Consider  a  data  compressor  that  maps  blocks  of  source 
symbols  to  variable  length  binary  sequences.  At  the  output 
of  the  compressor,  we  assume  that  zeros  and  ones  are  equally 
likely  and  that  an  error  at  this  point  would  be  undetectable 
to  the  decompressor.  To  limit  the  effect  of  channel  errors  we 
use  a  special  sequence  called  a  sync  marker.  The  length  m 
sync  marker  s  is  inserted  between  compressed  blocks  so  that 
block  boundaries  can  be  identified  following  a  channel  error. 
To  prevent  the  chance  occurrence  of  the  sync  marker  within 
the  compressed  data  sequence,  we  insert  a  bit  whenever  we 
observe  the  first  m  —  1  bits  of  the  sync  marker. 

Our  aim  is  to  determine  how  to  choose  sync  markers  and 
analyze  the  impact  on  overall  performance  of  this  strategy.  A 
complete  version  of  these  results  is  available  in  [2],  Related 
work  includes  [1,  3]. 

II.  Error  Containment  Performance 

The  bit  insertion  procedure  can  be  described  using  a  state 
diagram  that  is  essentially  the  same  as  that  in  [1],  The  ex¬ 
pected  total  number  of  bits  7n(s)  inserted  in  a  block  of  n 
compressed  bits  can  be  computed  from  the  transition  matrix 
for  the  state  diagram  [2]. 

Two  length  m  sync  markers  s  and  t  are  said  to  be  equivalent 
if  they  give  identical  performance  in  the  error  containment 
scheme,  i.e.,  when  7n(s)  =  7n(t)  for  all  n  >  0.  Two  equivalent 
sync  markers  can  have  very  different  state  diagrams. 

We  define  the  overlap  set  of  the  bit  string  s  =  S1S2  ■  .  .  sm 
as  V(s)  =  {1  <  i  <  m  :  si  ...  Si  =  sm_j+i  . . .  sm}.  In  other 
words,  i  €  V(s)  if  s  can  be  written  twice  with  i  identical  bits 
overlapping.  Let  W(s)  denote  the  overlap  set  of  s  with  the 
last  bit  inverted. 

Theorem  1  If  s  and  t  are  length  m  sync  markers  for  which 
1/(s)  =  V(t)  and  P'(s)  =  V'(t),  then  s  and  t  are  equivalent. 

The  asymptotic  growth  rate  of  the  average  number  of  inserted 
bits  7„(s)  can  be  neatly  evaluated  for  any  sync  marker  s: 

Theorem  2  If  s  is  a  length  m  sync  marker,  then 

lim  -In{s)  =  ( 2m~1  -1+  Y"  2'-1  -  V  2i_1 

x  iev(s)  tevqs) 

‘The  work  described  was  funded  by  the  TMOD  Technology  Pro¬ 
gram  and  performed  at  the  Jet  Propulsion  Laboratory,  California 
Institute  of  Technology  under  contract  with  the  National  Aeronau¬ 
tics  and  Space  Administration. 


Given  a  blocklength  n,  for  any  fixed  sync  marker  length 
m  <  8,  we  have  verified  empirically  (and  we  conjecture  that 
it  is  true  for  all  m)  that  the  optimal  sync  marker  must  belong 
to  one  of  three  classes.  These  classes  are  those  containing 
10m_1,  10m_2l,  and  lm,  which  we  refer  to  as  class-1,  class-2, 
and  class-3,  respectively. 

Theorem  3  For  class-1,  class-2,  and  class-3  sync  markers, 
the  average  number  of  inserted  bits  In(s)  takes  the  following 
form  wherever  it  is  nonzero: 

In(s )  ~  -j-r  +  b(s)  +  cn(s)2-n, 

a(s) 

where  a(s),  b(s)  are  independent  ofn,  and  cn(s)  is  periodic  in 
n  with  a  short  period  on  the  order  of  the  length  of  s.  Expres¬ 
sions  for  a(s),  b[s),  and  cn(s )  are  given  in  [2]  for  each  of  the 
three  special  classes. 

To  compare  the  performance  of  sync  markers  of  different 
lengths,  we  compute  the  average  data  expansion,  i.e.,  the  av¬ 
erage  number  of  extra  bits  that  are  added  to  each  data  block 
for  synchronization  purposes.  The  average  data  expansion  is 
Xn{s)  =  | s|  T  In  (s)  where  |s|  denotes  the  length  of  sync  marker 
s.  In  Figure  1  we  plot  the  difference  between  the  globally  op¬ 
timum  average  data  expansion  mins  Xn(s)  and  log2n. 

For  large  n,  class-1  and  class-2  markers  take  approximately 
equal  turns  at  being  optimum,  and  the  globally  optimum  av¬ 
erage  data  expansion  is  confined  to  a  tight  range  of  values 
between  log2  n  +1.9  and  log 2n  +  2.  Class-3  markers,  while 
asymptotically  optimum  for  any  fixed  marker  length  m,  are 
never  globally  optimum. 


Figure  1:  Globally  minimum  average  data  expansion, 
min3  Xn(s),  plotted  as  a  difference  relative  to  log2n. 
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I.  Introduction 

Lossless  data  compression,  such  as  arithmetic  coding  and  Ziv- 
Lempel  coding,  is  widely  used  for  text  compression [l].  Since  errors 
in  compressed  data  give  serious  influence  to  the  decompression,  er¬ 
ror  control  for  compressed  data  is  necessary.  From  this  point,  the 
authors  proposed  burst  error  recovery  in  arithmetic  coding[2].  This 
paper  proposes  error  recovery  schemes  for  LZW  coding  and  LZ77 
coding[l]  by  using  Unequal  Error  Protection  (UEP)  schemes,  which 
protect  the  important  parts  of  the  compressed  data  more  strongly. 

II.  Error  Recovery  for  LZW  Coding 

Let  the  input  alphabet[l]  contain  q  characters.  Initially,  the  dictio¬ 
nary  contains  q  distinct  phrases  each  of  which  corresponds  to  one 
character  in  the  input  alphabet.  A  phrase  is  a  word,  part  of  a  word, 
or  several  words[l].  The  compressor  searches  the  dictionary  for  a 
phrase  which  matches  the  input  and  dispenses  the  pointer  of  the 
phrase.  At  the  same  time,  a  new  phrase  is  parsed  and  added  into 
the  dictionary  until  the  number  of  phrases  reaches  its  limit.  Let  the 
size  of  the  dictionary  be  M  phrases.  Then  the  length  of  the  pointer 
is  [log2  Mj  bits,  where  fx]  is  the  smallest  integer  greater  than  or 
equal  to  x,  and  the  first  (M  —  q)  pointers  are  used  to  rebuild  the  dic¬ 
tionary  in  decompression.  Thus  the  former  part  is  more  important 
than  the  latter  part  find  should  be  more  strongly  protected. 

The  algorithm  of  the  proposed  scheme  is  as  follows. 

(1)  Divide  the  compressed  data  into  two  parts,  where  the  former 
part  consists  of  the  first  ( M  —  q )  X  flog2  M]  bits  and  the  latter 
part  consists  of  the  remaining  compressed  data. 

(2)  Apply  ti  bytes  error  correcting  code  to  the  former  part  and 
<2  bytes  error  correcting  code  to  the  remaining  latter  part, 
where  t\  >  <2- 

Fig.  1  shows  the  relation  between  error  location  in  the  com¬ 
pressed  data  and  relative  amount  of  errors  occurred  in  the  decom¬ 
pressed  data  in  the  proposed  scheme  with  M  —  8192,  q  =  256,  <i  = 
5  and  <2  =  1.  The  source  file  is  “paperl”[l].  The  check  bit  length  is 
116  bits  find  48-bit  burst  errors  are  injected.  The  relative  amount 
of  errors  is  given  by  calculating  the  percentage  of  erroneous  lines  to 
total  number  of  lines.  Here,  a  line  shows  a  group  of  phrases  sepa¬ 
rated  with  each  other  by  “return  mark”.  For  comparison,  the  cases 
of  applying  the  conventional  four  bytes  error  correcting  code  with 
the  same  number  of  check  bits  as  that  of  the  proposed  UEP  scheme, 
denoted  as  “4bEC” ,  to  the  whole  compressed  data  as  well  as  of  no 
error  correcting  code,  denoted  as  “Without  ECC” ,  fire  also  shown. 
Fig.  1  says  that  the  proposed  scheme  is  more  efficient  to  control  er¬ 
rors  than  the  conventional  error  control  coding.  Simulation  results 
for  other  source  files  are  similar  to  this. 


l  12*96  bytes  for 
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Pi  re-building  dictionary 

(116  check  bits,  ti  =5,  t2=l) 

-  i  i 

4bEC  code(116  check  bits) . 

.iJ.  1 

Without  ECC - 

H! 

Source:  "paper  1” 

i  1  ; 

/  53,161  bytes,  1250  lines,  \ 

II  •  .  |i  ! 

V  compressed  size:  26,689  bytes  / 

V..  kuuj  if 

Errors:  48-bit  burst  errors 

*  0  5000  10000  15000  20000  25000 


Error  location  in  compressed  data  (bytes) 

Fig.  1 :  Error  recovery  capability  of  UEP  scheme  for  LZW  coding 


III.  Error  Recovery  for  LZ77  Coding 

Compressor  of  LZ77  coding  searches  a  fixed-size  sliding  window  for 
the  longest  phrases  which  matches  the  current  input.  In  the  i- 
th  match,  compressor  dispenses  a  fixed-length  pointer  (o;,  /;,u;), 
where  o,  is  the  offset  of  the  match,  /,  the  matched  length  and  ut 
the  first  character  which  does  not  match.  In  decompression  of  the 
i-th  pointer,  the  decompressor  copies  a  /(-symbol  phrase  from  the 
position  indicated  by  o ,  and  shifts  the  copied  phrase  as  well  as  the 
character  u,  into  the  sliding  window  and  the  output  buffer. 

The  influence  of  errors  occurring  in  /, ’s  is  quite  different  from 
those  in  o, 's  or  u,  ’s.  If  error  is  in  ft’s,  the  length  of  copied  phrase 
is  changed.  It  affects  the  window  shift,  i.e.,  o,’s  of  the  following 
pointers  will  indicate  the  phrases  different  from  the  correct  ones. 
Hence,  all  the  following  decoded  phrases  are  corrupted.  Simulation 
shows  that  an  error  in  /;’ s  averagely  gives  over  40  times  larger 
damage  than  that  in  o;’s  or  Ui’s.  In  addition,  errors  in  /;’ s  located  in 
the  former  part  of  the  compressed  data  give  more  serious  influence 
than  those  in  the  latter  part.  Thus,  algorithm  of  the  proposed 
scheme  is  as  follows. 

(1)  Group  the  compressed  output  (oi,/i,tti),  (02,  A,  “2),  ..., 
( On,fn,un )  into  two  sequences  {oj ,  u\ ,  02 1 u2 , ...,  on,  un}  and 

{/1,/2,-,/n}. 

(2)  Apply  ij -bit  burst  error  correcting  code  to  {oy ,  Uy ,  02,  u2, 
•  ••>  On,  Un}. 

(3)  Apply  12-bit  burst  error  correcting  code  to  {/1 ,  f2,  ■■■, 
/|n/2j }  and  U-bit  burst  error  correcting  code  to 
{/|n/2J  +  l./[n/2j+2>->/n}.  Here,  l2  >  k- 

(4)  Append  the  check  bits  obtained  in  Steps  (2)  and  (3)  at  the 
end  of  the  compressed  data. 

Fig.  2  shows  the  relation  between  error  location  in  the  com¬ 
pressed  data  and  relative  amount  of  errors  occurred  in  the  decom¬ 
pressed  data  in  the  the  proposed  scheme  with  ii  =  16,  l2  =  12  and 
(3  =  8.  The  source  file  is  “paperl”[l].  The  check  bit  length  is  105 
bits  and  48-bit  burst  errors  are  injected.  The  results  of  Fig.  2  are 
obtained  in  the  same  way  as  that  in  Fig.  1.  For  comparison,  the 
cases  of  applying  the  40-bit  burst  error  correcting  Fire  code  with 
119  check  bits  as  well  as  of  no  error  correcting  code,  denoted  as 
“Without  ECC”,  are  also  shown.  Fig.  2  says  that  the  proposed 
scheme  is  more  powerful  to  control  errors  than  applying  the  40- 
bit  burst  error  correcting  Fire  codes.  Simulation  results  for  other 
source  files  also  lead  to  the  similar  conclusion. 


Fig.  2:  Error  recovery  capability  of  UEP  scheme  for  LZ77  coding 
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Abstract  —  We  examine  communication  over  block 
fading  additive  Gaussian  channels  with  delayed  chan¬ 
nel  state  information  feedback  and  finite  decoding  de¬ 
lay  constraints.  Following  an  approach  due  to  Cover 
[1],  we  apply  the  broadcast  strategy  and  find  the  max¬ 
imum  achievable  expected  rate  in  cases  where  the  un¬ 
derlying  parallel  Gaussian  broadcast  channels  are  not 
degraded  in  the  same  direction. 

I.  Introduction 

Consider  a  single  user  communicating  over  a  discrete  time 
additive  Gaussian  noise  channel  where  the  noise  variance  stays 
constant  over  a  block  of  N  symbols  but  may  vary  from  block 
to  block.  The  channel  model  over  the  £th  block  ( k  S  Z)  is 

Yk  =  Xk  +  Zk  (1) 

where  Xk  =  (Xki,---  ,XkN)  and  Yk  —  (Y/ti,...  ,  Yitjv)  are 
vectors  in  RN  representing  the  inputs  and  outputs  of  the  chan¬ 
nel  over  the  A:th  block.  Z*  is  a  Gaussian  random  vector  with 
mean  zero  and  covariance  matrix  Ski ■  Assume  that  the  noise 
variance  process  {S’*,}  is  stationary  ergodic  and  that  S,  the 
state  space  of  {5*},  is  a  finite  subset  of  R.  The  Zt’s  are  as¬ 
sumed  to  be  independent.  This  is  an  equivalent  model  for  a 
version  of  the  block  fading  channel.  For  convenience,  we  refer 
to  (1)  as  the  block  Gaussian  channel  (BGC). 

Assume  that  during  the  fcth  block,  the  receiver  has  perfect 
knowledge  of  the  noise  variances  S*LX  =  (. . .  ,  Sk-i,Sk),  while 
the  transmitter  has  perfect  knowledge  only  of  5*^,  where 
d  >  1,  via  delayed  noiseless  feedback.  Next,  suppose  the  sys¬ 
tem  has  a  maximum  allowable  decoding  delay  of  KN  symbols 
(K  €  N).  The  goal  is  to  maximize  the  expected  rate  (expecta¬ 
tion  is  over  the  fading  process  {S*})  over  the  BGC  subject  to 
the  decoding  delay  constraint  and  an  average  power  constraint 

KivELiEtiE  [XL\<P. 

II.  Decoding  Delay  of  One  Block  (K  =  1) 

For  the  one  block  case,  we  show  that  the  maximum  expected 
rate  per  block  is  attained  by  a  broadcast  strategy  which  as¬ 
sociates  noise  variance  levels  in  the  BGC  with  corresponding 
receivers  in  a  degraded  broadcast  channel.  The  optimal  power 
splitting  parameters  are  chosen  according  to  the  conditional 
probabilities  of  the  current  channel  state  given  all  previous 
channel  states. 

Theorem  1  Consider  a  BGC  with  an  average  transmit  power 
constraint  P  and  noise  power  varying  according  to  a  stationary 


ergodic  process  {Sk,  k  £  Z}  with  state  space  S  =  {rji, ...  ,»}/,}, 
>  T)2  >  ■  ■  ■  >  tjl  •  Suppose  the  decoding  delay  constraint  is 
N  symbols  and  noiseless  channel  state  feedback  to  the  trans¬ 
mitter  is  delayed  by  d  blocks.  If  arbitrarily  small  error  proba¬ 
bility  is  required,  the  expected  rate  per  block  satisfies 


Es*  [R]  <  Es 


L(=l 


E  j>lo'HSk_-J)p  +  ni 


where  Q,(Sk_-J)  =  E^(^=%|S^),  anda'(sk_~J)  = 

(QI  (S-J)  i  ■  ■  ■  <al  (-5-^))  maximizes 


L 


E 


'£p(Sk  =  r]j\skrJ 


j=> 


aiP 


J2j>iajp  +  rn 


subject  to  an  >  0,  ]Er=i  al  =  1- 


III.  Decoding  Delay  of  Two  Blocks  ( K  =  2) 

The  main  challenge  in  extending  Theorem  1  to  the  case  of 
K  >  1  is  that  unlike  the  one-block  case,  the  underlying  paral¬ 
lel  Gaussian  broadcast  channels  for  K  >  1  are  in  general  not 
degraded  in  the  same  direction.  Nevertheless,  we  extend  El 
Gamal’s  work  [2]  on  the  capacity  region  for  the  two-receiver 
two-parallel  Gaussian  broadcast  channel  with  common  infor¬ 
mation  to  conclude  that  the  broadcast  strategy  remains  opti¬ 
mal  for  the  the  case  of  a  two  state  i.i.d.  BGC  with  K  =  2. 

Theorem  2  Consider  a  two-state  i.i.d.  BGC  with  noise  vari¬ 
ance  S  =  r)i  with  probabilty  q  and  S  =  tj 2  w.p.  1  —  q.  Let  the 
average  transmit  power  constraint  be  P  and  the  decoding  delay 
constraint  be  two  N -blocks.  If  arbitrarily  small  error  probabil¬ 
ity  is  required,  the  expected  rate  per  block  satisfies 


where  a  =  1  —  a,  and  a*  =  (aj, 0:2103)  maximizes  the  RHS 
in  (2)  in  a  subject  to  a;  >  0,  i  =  1,2,3  and  =  1. 
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Abstract  —  A  game  theory  formulation  for  the  Gaus¬ 
sian  interference  channel  is  given  under  the  assump¬ 
tion  that  no  interference  subtraction  is  performed. 
The  existence,  uniqueness  and  stability  of  a  pure 
strategy  Nash  equilibrium  is  established  under  a  mild- 
interference  condition, 

I.  Introduction 

In  a  Gaussian  interference  channel,  two  independent 
sender-receiver  pairs  attempt  to  communicate  in  the  presence 
of  interference  from  each  other.  The  capacity  region  for  the 
interference  channel  is  still  an  open  problem.  The  largest  rate 
region  presently  known  is  achieved  with  superposition  cod¬ 
ing  and  interference  subtraction  [1],  but  its  optimality  is  not 
yet  known.  However,  the  traditional  view  of  the  interference 
channel  allows  the  two  senders,  while  remaining  independent, 
to  be  cooperative  in  their  respective  coding  strategies.  If  such 
cooperation  cannot  be  assumed,  the  interference  channel  be¬ 
comes  a  non-cooperative  game.  This  paper  studies  the  inter¬ 
ference  channel  from  this  game  theory  perspective.  We  focus 
on  Gaussian  interference  channels  with  memory,  but  make 
the  simplifying  assumption  that  no  interference  subtraction  is 
performed  regardless  of  interference  strength.  We  ask  the  fol¬ 
lowing  questions:  if  each  sender’s  sole  objective  is  to  maximize 
its  own  data  rate,  can  an  equilibrium  be  achieved  in  a  com¬ 
petitive  environment?  If  so,  is  such  an  equilibrium  unique? 

II.  Competitive  Equilibrium 

The  two  senders  in  a  Gaussian  interference  channel, 

yx  =  xi  +  d2X2  +  ni  (1) 

y2  =  x2  +  AiXi  +  n2,  (2) 

are  considered  as  two  players  in  a  game.  The  channel  transfer 
functions  are  normalized  to  unity.  The  square  magnitude  of 
the  interference  transfer  functions  Ai  and  A2  are  denoted  as 
a\ (/)  and  a2(f).  Let  Ni(f)  and  N2(f)  denote  noise  power 
spectrum  densities.  The  structure  of  the  game,  i.e.,  the  in¬ 
terference  coupling  functions  and  noise  power,  are  common 
knowledge  to  both  players.  A  strategy2  for  each  player  is  its 
transmit  power  spectrum,  P\{f)  and  P2(/),  subject  to  the 
power  constraints  fF  P\{f)df  <  Pi,  and  fF  P2(f)df  <  P2, 
respectively.  The  payoffs  are  data  rates  : 

R'~  L  (‘ T  wTTwkwi )  df'  <3) 

R‘’0*{1+m)T%w>)df’  <4) 

where  bandwidth  up  to  F  is  used.  The  game  is  not  zero- 
sum.  We  are  interested  in  characterizing  its  pure  strategy 
Nash  equilibrium. 

JThis  work  was  supported  by  a  Stanford  Graduate  Fellowship. 

2Only  pure  (or  deterministic)  strategy  is  considered  here. 


A  Nash  equilibrium  is  a  strategy  profile  in  which  each 
player’s  strategy  is  an  optimal  response  to  the  other  player’s 
strategy  [2].  For  the  interference  channel,  the  optimal  re¬ 
sponse  for  a  player  is  the  waterfilling  of  its  power  with  respect 
to  the  combined  noise  and  interference.  If  the  power  distribu¬ 
tions  are  such  that  waterfilling  is  achieved  simultaneously  for 
both  players,  a  Nash  equilibrium  is  reached.  At  a  Nash  equi¬ 
librium,  neither  player  has  an  incentive  to  move  away  from  its 
present  power  distribution. 

Theorem:  Suppose  that  supy  a\(f)  ■  sup^  a2(f)  <  1,  then 
a  pure  strategy  Nash  equilibrium  in  the  Gaussian  interference 
game  exists,  is  unique,  and  is  stable. 

Proof:  The  first  idea  is  to  recognize  that  if  ai(/)  -q2(/)  <  1 
V/,  there  is  a  Nash  equilibrium  corresponding  to  every  wa¬ 
ter  level  (K\,Ki).  For  fixed  (Kj ,  K2),  the  Nash  equilibrium 
(Pi(/),P2(/))  is  found  by  solving  the  waterfilling  condition: 

Pi(f)  +  a2(f)P2(f)  +  N1(f)  =  K1  (5) 

P*(f)  +  <*i(f)Pi(f)  +  N2(f)  =  K2,  (6) 

unless  qi(/)  >  kI-nII/]  >  *n  which  case  Pi  (/)  =  max{0,  (K\  — 
N\ (/))}  and  P2(f)  =  0,  or  a2(/)  >  in  which  case 

P2(/)  =  max{0,  (K2  -  N2{f))}  and  Pi(/)  =  0. 

Next,  we  establish  that  for  a  given  power  constraint 
(P i ,  P2) ,  there  exists  (K\,K2)  whose  Nash  equilibrium  has 
exactly  this  power.  For  each  (K\,K2),  denote  the  power 
level  at  the  corresponding  Nash  equilibrium  as  (P/c, , Pjc2)- 
Observe  that  when  ai(f)  ■  a2(f)  <  1,  if  K\  <  K[  and 
K2  =  K2,  then  Pk,  <  P K\  and  P k2  >  Pjc'-  Now,  start 
with  K\  —  K2  =  0.  Increase  K\  until  Pkx  =  Pi,  then  in¬ 
crease  K2  until  Pk2  —  P 2 -  But  then,  we  have  Pjc,  <  Pi  by 
observation.  So,  we  can  increase  K\  again,  until  Pkx  =  Pi, 
then  increase  K2,  etc.  The  increasing  sequences  of  K\  s  and 
K2  s  converge  because  they  cannot  go  to  infinity  with  finite 
power  constraints.  The  limit  point  is  a  Nash  equilibrium  cor¬ 
responding  to  (Pi,P2). 

To  prove  uniqueness,  let  (P\  (f),  P2  (/))  be  the  power  dis¬ 
tribution  at  a  Nash  equilibrium.  Start  with  any  power  distri¬ 
bution  P[°\f)  that  satisfies  the  power  constraint.  Waterfill 
for  P2°^(/),  assuming  P[°\f)  as  interference.  Then,  waterfill 
for  PjX)(/),  assuming  P'20> (/)  as  interference,  etc.  This  itera¬ 
tive  waterfiling  process  coverges  in  Li-norm  fQF  \P\k">  —  P\  \ df 
because  maxdKPf^  -  P1W)+||1,  ||(P<fc+1)  -  PiN)-||i)  < 
supa2(/)  •  max(||(P2(fc)  -  P2N)  +  ||i,  ||(JP2(fc)  -  P2'V)-||i)  < 
supQ2(/)supai(/)-max(||(P1(fc)-P1N)  +  ||i,||(P1(*:)-P1N)_||i), 
which  is  a  contraction  by  the  assumption  that  supai(/)  • 
supa2(/)  <  1.  So,  Pjfc^  — >  PjN  in  Li-norm  as  k  — >  00.  □ 
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Abstract  — 

We  characterize  the  capacity-achieving  distribution 
for  a  class  of  non-Gaussian  additive  noise  channels, 
when  the  transmitter  is  subject  to  an  “average”  power 
constraint.  Specifically,  we  show  that  if  the  prob¬ 
ability  density  function  of  the  noise,  in  addition  to 
satisfying  some  mild  technical  conditions,  has  a  tail 
which  decays  at  a  rate  slower  (resp.  faster)  than 
the  Gaussian,  then  the  capacity-achieving  distribu¬ 
tion  has  bounded  (resp.  unbounded)  support. 

I.  Preliminaries 

Consider  a  (discrete-time)  additive  noise  channel  where  the 
IR-valued  output  corresponding  to  an  IR-valued  input  xt  is 
given  by 

Yt=xt  +  Zt,  t>  1,  (1.1) 

where  {Zt}t^=  i  is  the  IR-valued  i.i.d.  noise  with  IE[Zt]  =  0,  and 
lE[(Zt  -  E[Zt])2]  =  a2z. 

Assume  that  the  distribution  of  the  rv  Zt  admits  a  proba¬ 
bility  density  function  (pdf)  pz  with  respect  to  the  Lebesgue 
measure.  Then,  for  any  IR-valued  channel  input  rv  Xt  with 
distribution  Q,  let  Py  denote  the  resulting  pdf  of  the  output  rv 
Yt.  The  capacity  of  the  channel  in  (1.1),  subject  to  an  average 
power  constraint  Vo  <  oo,  denoted  C(V o),  is  given  as 

C(P 0)  =  max  I  iff),  (1.2) 

Q-vq[x?}<v0 

where  I(Q)  =  I(Xt  A  Yt). 

Let  the  marginal  entropy  density  (cf.  e.g.,  Smith  (1969)) 
be  given  by 

/OO 

Pz{y  -  x)  log  p{y,Q)dy,  x  €  1R,  Q  €  A 

-  OO 

(1.3) 

where  A  denotes  the  set  of  all  distributions  of  the  IR-valued 
rv  Xt- 

Next,  we  consider  three  kinds  of  noise  rv  with  pdf  pz  in 
(1.1):  heavy-tailed,  light-tailed  and  bounded. 

A  noise  rv  Zt  will  be  called  heavy-tailed  if  its  pdf  pz  satisfies 
the  following  conditions: 

(Al)  it  is  uniformly  continuous; 

(A2)  for  Qi,  Q2  6  A,  it  holds  that  if  p®1  ( y )  =  Py2(y),  y  E  IR, 
then  Q i  =  Q2\ 

(HI)  there  exist  (finite)  positive  constants  ki,/c2  and  ph,  0  < 
ph  <  2,  such  that 

Pz{z)  >  kie~k^z^h  ,  z  E  IR;  (1.4) 

(H2)  there  exist  (finite)  positive  constants  ko  and  /c4,  such 
that 

pz{z)-kl T^’  zeJR-  (L5) 


On  the  other  hand,  a  noise  rv  Zt  will  be  called  light-tailed 
if  its  pdf  pz,  in  addition  to  satisfying  (Al)  and  (A2),  is  such 
that 

(LI)  there  exist  (finite)  positive  constants  ci,C2,  and  pi  >  2, 
such  that 

Pz(z)<c ie_C2|2|P' ,  z  E  3R;  (1.6) 


(L2)  there  exists  a  (measurable)  convex,  increasing  mapping 


(j>  :  IR+  — t  IR+,  such  that 

4>{z) 

< 

oo,  z  £  IR+ 

pz(z) 

> 

e“^(|2|),  ^  £  IR, 

nmm 

< 

oo. 

(L7) 

Finally,  a  noise  rv  Zt  will  be 

called  bounded  if  its  pdf  pz  is 

a  bounded  function  and  has  bounded  support. 

II.  Result 

Theorem  II. 1  If  the  noise  rv  Zt  in  (1.1 )  is  heavy-tailed,  then 
the  capacity- achieving  distribution  Q o  in  (1.2)  has  a  bounded 
support.  If  the  noise  pdf  has  the  additional  property  that  for 
every  Vo  >  0  and  Q  £  Qv0,  there  exists  an  analytic  extension 
of  the  marginal  entropy  density  h(x;Q),  a;  €  IR,  then  Qo  has 
a  finite  support.  On  the  other  hand,  if  the  noise  rv  Zt  is 
light-tailed  or  has  a  bounded  support,  then  Qo  has  unbounded 
support. 

Earlier  related  results  can  be  found  in  the  work  of  Abou- 
Faycal,  Trott  and  Shamai  (ISIT  1997). 

The  heuristics  behind  the  assertions  in  Theorem  II.  1  can 
be  understood  as  follows,  denoting  by  g  a  Gaussian  pdf  with 
mean  zero  and  variance  V0  +  <r2,  and  by  QPo  the  set  { Q  £ 
Qv0  ■  IEq[V(2]  =  Vo },  under  the  conditions  (HI,  H2)  or  (LI. 
L2)  and  (Al),  it  holds  that 

C(V o)  =  max  I(Q)  (11.1) 

QeQ‘Po 

=  ^[1  +  log27r(P0  +ff*)]  -  Mzt)  -  min  D{p%\\g). 

L  <?6Q7’0 

Hence,  the  maximization  in  (1.2)  is  equivalent  to  minimiz¬ 
ing  the  (Kullback-Leibler)  divergence  between  the  pdf  of  the 
output  rv  Yt  and  the  Gaussian  pdf  g.  The  assertion  in  The¬ 
orem  II. 1  is  then,  in  effect,  a  reflection  of  the  fact  that  for  a 
given  noise  pdf,  an  input  distribution  with  a  bounded  support 
results  in  an  output  pdf  which  decays  more  rapidly  than  when 
the  input  distribution  has  unbounded  support. 
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Abstract  —  The  capacity  of  future  high-density  mag¬ 
netic  recording  systems  is  expected  to  be  limited  pri¬ 
marily  by  “jitter”.  For  such  systems,  a  new  simple 
channel  model  is  proposed.  A  factor  graph  repre¬ 
sentation  as  well  as  upper  and  lower  bounds  on  the 
capacity  of  this  channel  model  are  given. 

As  mechanical  and  electronic  components  of  magnetic 
recording  systems  are  being  improved,  the  “noise”  of 
the  magnetic  medium  itself  will  begin  to  dominate  other 
noise  sources  [1].  This  “medium  noise”  is  highly  signal- 
dependent  [2]  and  comes  in  two  different  forms.  First,  isolated 
transitions  (i.e.,  changes  of  magnetic  polarization)  are  affected 
by  jitter  [3]:  the  transition  is  read  at  a  different  position  than 
where  it  was  written.  Second,  very  short  polarization  regions 
tend  to  be  unstable:  the  two  transitions  move  towards,  and 
may  actually  cancel,  each  other. 

The  present  paper  addresses  the  problem  of  modeling  these 
effects  in  a  way  that  is  suitable  for  signal  processing.  To  this 
end,  the  magnetic  recording  channel  is  first  decomposed  into 
three  parts:  a  “binary  jitter  channel”  (BJC)  that  captures 
the  mentioned  medium  noise,  a  linear  intersymbol  interference 
channel  that  is  defined  by  the  impulse  response  of  the  read 
head,  and  additive  white  Gaussian  noise  due  to  the  amplifier. 
The  BJC  is  then  further  decomposed  as  follows. 

Let  Xk  €  {0, 1}  and  Yk  G  {0, 1}  be  the  time-A;  input  and 
output,  respectively,  of  the  BJC,  where  Xk  =  1  (Y*  =  1) 
means  that  a  transition  is  written  into  (read  from)  the  time-A; 
slot.  The  BJC  Xk  — ►  Yk  is  decomposed  into  a  memoryless 
probabilistic  channel  Xk  —>  Jk  and  a  deterministic  channel 
Jk  -4  Yk  with  memory.  The  auxiliary  variable  Jk  takes  values 
in  the  set  {0}  U  {£>’  :  i  =  —  m,  —m  +  1, . . .  ,  m}  for  some  posi¬ 
tive  integer  m;  Jk  =  D3  means  that  a  transition  was  written 
into  the  time-A;  slot  and  moved  into  slot  k  +  j.  We  mainly 
consider  the  simplest  case  with  m  =  1,  pj\x{D±l\\)  —  p,  and 
Pj|x(1|1)  =  1  -  2p.  We  always  have  pj|x(0|0)  =  1. 

The  deterministic  channel  Jk  -4  Yk  — which  takes  into  ac¬ 
count  the  cancellation  of  transitions  that  fall  into  the  same 
slot  or  cross — can  be  described  by  a  trellis.  For  m  =  1,  this 
trellis  has  4  states  and  16  branches. 

The  factor  graph  [4]  that  corresponds  to  this  BJC  model 
is  shown  in  Fig.  1.  This  factor  graph  can  be  plugged  into  a 
block  factor  graph  (as  in  Fig.  2)  of  the  whole  system.  The  sum- 
product  algorithm  ( “probability  propagation” )  [4]  can  then  be 
applied  to  “turbo”  decoding  of  such  a  system. 

The  mentioned  cancellation  of  crossing  transitions  makes 
it  difficult  to  compute  the  capacity  of  the  BJC.  How¬ 
ever,  methods  similar  to  those  of  [5]  (where  can¬ 
cellations  were  not  considered)  can  be  used  to  ob¬ 

1Partial  support  from  NSF  Grant  CCR-9904458. 

2This  work  was  supported  by  NSF  Grant  CCR-9984515. 

3This  work  was  supported  by  Grant  TH-16./99-3. 


tain  tight  upper  and  lower  bounds  by  optimization  of 
Cl  =  maxpx  [tf^Y^1)  -  //(YlIX^+J and 

C?  =  maxPx  [H(YL\YtZ1SLL)-H(YL\XtLLt\Yil1)]  re¬ 
spectively,  where  S'_L 's  the  state  of  the  BJC  composed  of  the 
time-(— L)  state  and  M  prior  inputs.  The  input  is  assumed 
to  be  stationary  and  generated  by  a  Markov-Chain  of  order 
M.  Then  C™  <  CM  <  Cl  ,  and  CM  approaches  capacity 
for  M  — >  oo.  Fig.  3  shows  upper  and  lower  bounds  on  the 
capacity  of  the  BJC  for  (1,  oo)-constrained  input  sequences, 
i.e.,  there  is  at  least  one  zero  between  two  ones. 


Fig.  1:  Factor  graph  representation  of  the  BJC. 
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Fig.  2:  Block  factor  Fig.  3:  Bounds  of  the  BJC  for 
graph.  (1,  oo)-constrained  inputs. 
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Abstract  —  How  much  traffic  can  wireless  networks 
carry?  Consider  n  nodes  located  in  a  disk  of  area  A 
sq.  meters,  each  capable  of  transmitting  at  a  data  rate 
of  W  bits/sec.  Under  a  protocol  based  model  for  suc¬ 
cessful  receptions,  the  total  network  can  carry  only 
0  (W  y/  An)  bit-meters/sec,  where  1  bit  carried  a  dis¬ 
tance  of  1  meter  is  counted  as  1  bit-meter.  This  is 
the  best  possible  even  assuming  the  nodes  locations, 
traffic  patterns,  and  the  range/power  of  each  trans¬ 
mission,  are  all  optimally  chosen.  If  the  node  loca¬ 
tions  and  their  destinations  are  randomly  chosen,  and 
all  transmissions  employ  the  same  power/range,  then 
each  node  only  obtains  a  throughput  of  ©  ( W/y/n  log  n) 
bits/sec,  if  the  network  is  optimally  operated.  Similar 
results  hold  for  a  physical  SIR  based  model. 

I.  Introduction 

Consider  a  network  with  n  nodes  located  in  an  area  of  A  sq. 
m.  Every  node  can  transmit  at  a  data  rate  of  W  bits/sec  over 
a  common  channel.  Due  to  interference  between  transmis¬ 
sions,  we  need  to  specify  when  transmissions  are  successfully 
received.  We  allow  for  two  models. 

The  Protocol  Model:  For  a  node  to  receive  a  transmission 
at  a  range  r,  there  can  be  no  other  simultaneous  transmissions 
within  a  range  (1  +  A )r  from  it.  (Or  one  can  assume  that 
interference  rules  out  any  other  receptions  in  a  disk  of  radius 
(1  +  A )r  around  a  transmitter  of  range  r). 

The  Physical  Model:  Assume  that  path-loss  can  be  mod¬ 
eled  as  r_“  where  a  >  2,  and  that  there  is  ambient  noise  of 
power  level  N.  Then  a  transmission  by  node  X,  at  a  power 
level  Pi  is  successfully  received  by  node  Xj  if  and  only  if  the 
signal-to-interference  ratio  (SIR)  is  at  least  /3,  i.e. , 

p,  j Xj  -  jor  >  B 
tf  +  Efc€Tp*  -*>1-“  " 

Above  T  is  the  set  of  all  other  nodes  which  are  transmitting 
at  the  very  same  time,  and  Pk  is  the  power  level  of  node  Xk- 

II.  The  Best  Case  Scenario 

The  destinations  of  nodes  are  allowed  to  be  arbitrary,  as  are 
the  traffic  levels  for  OD-pairs.  Each  transmission  may  be  of 
an  arbitrary  range/power. 

We  say  that  the  network  has  transported  1  bit-meter  when 
1  bit  has  been  transmitted  over  a  distance  of  1  meter. 

1  This  material  is  based  upon  work  partially  supported  by  the  Air 
Force  of  Scientific  Research  under  Contract  No.  AF-DC-5-36128, 
the  Office  of  Naval  Research  under  Contract  No.  N00014-99-0696, 
and  EPRI  and  DOD-ARO  under  subcontract  Nos.  W08333-04  and 
35352-6086.  Any  opinions,  findings,  and  conclusions  are  those  of 
the  authors  and  do  not  necessarily  reflect  the  views  of  the  above 
agencies. 

2Please  address  all  correspondence  to  the  second  author. 


III.  The  Random  Scenario 

We  assume  that  the  n  nodes  are  randomly  located  (uniform 
iid)  in  a  disk  of  area  A  sq.  m.  Each  node  has  a  random 
destination,  chosen  as  the  node  nearest  to  a  uniform  and  iid 
chosen  point  to  which  it  wishes  to  send  traffic  at  a  rate  A (n) 
bits/sec.  We  suppose  that  in  this  homogeneous  environment 
all  transmissions  employ  the  same  range  or  power. 

We  say  that  the  throughput  capacity  is  0(/(n))  bits/sec 
if  for  some  constants  0  <  c  <  c'  <  +oo, 

lim  Prob  (A(n)  =  c/(n)  is  feasible  for  each  node)  =  1 

n  — ►  oo 

lim  inf  Prob  (A(n)  =  c  f(n)  is  feasible  for  each  node)  =  0. 

n— »oo 

IV.  The  Main  Results 

Theorem  1.  Best  Case  for  Protocol  Model:  The  trans¬ 
port  capacity  is  ©  (W y/ An)  bit-meters/sec.  More  specifically, 

W  \/~A  n  Transport  ^  f~S  W  y/A  /— 

1+2AV^+  y/g/  -  capacity  -  ~  A  v rt- 

Theorem  2.  Best  Case  for  Physical  model:  A  trans¬ 
port  capacity  of  dW  y/  An  bit-meters/sec  is  feasible,  while 
d W  \/~An  a  bit-meters/sec  is  not.  More  specifically, 

“  W \[A  n  Transport 

y/n  +  v/8rr  ~  capacity 

<  7r-5  [2  +  2//3]“  W'/An^. 

Theorem  3.  Random  Case  for  Protocol  Model:  The 

throughput  capacity  is  0  (W  /  %/nlog  n)  bits/sec. 

Theorem  4.  Random  Case  for  Physical  Model:  A 
throughput  A(n)  =  cW  /  y/n  log  n  bits/sec  is  feasible,  while 
A(n)  =  c'W  /  y/n  is  not,  for  appropriate  values  of  c  and  d , 
both  with  probability  approaching  one  as  n  -A  +oo. 

V.  Concluding  Remarks 

Under  a  protocol  model,  the  best  per-node  throughput  for 
a  wireless  network  with  n  nodes,  with  each  node  having  a 
destination  non-vanishingly  far  away,  is  ©  (l  /  y/n)  bits/sec. 
If  the  nodes  are  randomly  located,  the  per-node  throughput 
is  ©  (l  /  y/n  log  n)  bits/sec.  The  random  case  is  nearly  best. 

Thus,  in  wireless  networks,  compromises  should  be  made 
either  with  respect  to  the  number  of  nodes  involved,  or  ba¬ 
sically  only  nearest  neighbor  communication  should  be  envis¬ 
aged.  Other  conclusions  following  from  the  constructive  proof 
of  capacity  in  the  random  case  are:  A  cellular  operation  is 
feasible,  the  range  of  nodes  is  about  0(yj A logn /rrn),  and 
the  fraction  of  time  a  node  is  busy  is  only  0  (l/logn). 
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Pig.  1:  Signal  estimation  from  channel-corrupted  encodings. 


Abstract  —  We  develop  extensions  to  our  techniques 
in  [1]  for  signal  estimation  from  sequential  encodings 
in  the  form  of  quantized  measurements  communicated 
over  binary  symmetric  channels.  We  show  that  the 
channel  quality  affects  not  only  the  quality  of  the 
encoding  but  also  its  optimality.  We  also  construct 
encodings  from  optimized  pseudo-noise  and  feedback- 
based  control  inputs,  and  efficient  signal  estimators 
from  channel  corrupted  versions  of  the  encodings. 

I.  Introduction 

We  consider  sequential  signal  encoding  strategies  in  the 
form  of  quantizer  bias  control  for  wireless  sensor  networks 
where  the  sensor  encodings  are  communicated  to  the  host  over 
a  binary  symmetric  channel  (BSC). 

In  [1]  we  focused  on  the  case  where  the  encodings  are  com¬ 
municated  error-free  to  the  host  and  showed  that  a  prop¬ 
erly  designed  control  input  added  to  the  noisy  signal  prior  to 
quantization  can  improve  the  effective  digital  encoding.  For 
this  scenario,  we  developed  optimized  control  input  selection 
strategies  and  associated  estimators  for  several  different  sce¬ 
narios.  These  methods  can  be  viewed  as  lossy  encoding  tech¬ 
niques  of  a  noisy  source  [2,  3,  4]  that  are  sequential. 

We  develop  extensions  of  these  approaches  for  the  case  that 
each  communication  link  is  a  BSC.  The  block  diagram  involv¬ 
ing  a  single  sensor  and  the  host  is  shown  in  Fig.  1,  where  Xu[n] 
is  a  control  input,  w[n]  is  zero-mean  IID  Gaussian  sensor  noise, 
y[n ]  denotes  the  binary  quantized  signal  sent  to  the  host,  and 
z[n ]  denotes  the  encoding  sequence  received  by  the  host. 

II.  Main  Results  and  Discussion 

We  focus  on  two  special  cases.  First,  we  consider  pseudo¬ 
noise  control  inputs  whose  statistical  characterization  alone  is 
exploited  at  the  host  for  estimation.  Second,  we  examine  the 
effects  of  BSC  errors  when  knowledge  of  the  control  input  can 
be  exploited  for  estimation  at  the  host  and  where,  in  addition, 
feedback  information  from  the  host  to  the  sensor  is  available 
and  can  be  exploited  in  the  selection  of  the  control  input. 

For  pseudo-noise  control  inputs  that  are  accurately  mod¬ 
eled  as  sample  paths  of  a  zero-mean  IID  Gaussian  process,  we 
show  the  optimal  pseudo-noise  level  is  an  increasing  function 
of  the  BSC  error  probability  pe\  in  particular,  as  pe  is  varied 
from  0  to  0.5,  the  optimal  aggregate  (sensor  plus  pseudo-noise) 
level  cropt  changes  monotonically  from  «  2  A/tt  to  A,  where 
A  denotes  the  signal  dynamic  range.  Hence,  in  order  to  ac¬ 
curately  optimize  the  quality  of  the  pseudo-noise  encoding  at 
the  sensor  it  is  important  to  take  into  account  the  quality  of 
the  BSC.  We  also  show  that,  if  the  encoder  does  not  know  the 
fidelity  of  the  BSC,  choosing  aopt(pe  —¥  0.5)  achieves  the  best 
performance  across  the  pe  spectrum. 


In  the  second  case  the  host  knows  the  control  input  and  can 
also  broadcast  feedback  information  (based  on  past  received 
encodings)  which  can  be  used  by  each  sensor  in  the  selection 
of  future  control  input  values.  For  any  given  pe  we  derive  a 
bound  on  the  MSE  performance  of  any  feedback-based  con¬ 
trol  input  strategy  and  develop  control  selection  methods  and 
computationally  efficient  estimation  algorithms  that  achieve 
this  bound.  Again,  both  the  optimized  encoding  and  estima¬ 
tion  algorithms  depend  on  the  BSC  quality. 

We  also  develop  extensions  of  these  bounds  and  algorithms 
that  achieve  them  for  the  more  practical  case  where  the  sensor 
receives  noise-corrupted  feedback  information  from  the  host, 
and,  in  particular,  the  case  where  the  (additive)  feedback  noise 
is  well  modeled  as  a  zero-mean  IID  Gaussian  random  process. 
Although  the  estimators  we  develop  effectively  achieve  the  as¬ 
sociated  bound  provided  the  number  of  observations  is  large 
enough,  the  larger  the  pe  level  the  larger  the  number  of  obser¬ 
vations  required  to  effectively  achieve  the  bound. 

Finally,  as  in  the  case  pe  =  0  that  was  considered  in  de¬ 
tail  in  [1],  for  any  given  pe  the  asymptotic  performance  of 
these  systems  can  be  completely  characterized  by  means  of 
the  signal-to-noise  ratio  (SNR).  In  particular,  for  any  given 
Pe  level,  the  MSE  performance  loss  with  pseudo-noise  con¬ 
trol  inputs  can  be  made  to  grow  quadratically  with  SNR  by 
judicious  selection  of  the  pseudo-noise  power  level,  while  a 
fixed  loss  independent  of  SNR  can  be  achieved  in  the  feed¬ 
back  cases.  For  comparison,  the  MSE  performance  loss  due 
to  the  encoding  in  the  absence  of  control  input  can  be  shown 
to  grow  exponentially  with  SNR  for  any  given  pe  value. 
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Abstract :  In  this  paper,  wireless  sensor  networks  are  considered 
from  an  information  theory  point  of  view  and  the  rate  distortion  region 
for  the  special  case  of  correlated  Gaussian  sources  where  n  sources 
provides  partial  side  information  to  one  main  source  is  discussed. 

I.  Introduction 

It  is  well  known  from  the  theory  of  distributed  detection  that  higher 
reliability  and  lower  probability  of  detection  error  can  be  achieved  when 
observation  data  from  multiple,  distributed  sources  is  intelligently  fused  in  a 
decision  making  algorithm,  rather  than  using  a  single  observation  data  set 
[1,  3],  This,  coupled  with  the  fact  that  fabrication  technological  advances 
have  made  low-cost  sensors  incorporating  wireless  transceivers,  signal 
processing  and  sensing  in  one  integrated  package  a  desirable  low-cost 
option,  it  is  inevitable  that  such  devices  will  be  widely  used  in  detection 
applications  such  as  security,  monitoring,  diagnostic,  remote  exploration 
etc.  This  has  given  rise  to  the  development  of  wireless  integrated 
networked  sensors  (WINS)  [2], 

However,  the  effective  deployment  of  such  distributed  processing 
systems  introduces  some  significant  design  issues,  most  notably: 
networking  and  communication  protocols,  transmission  channel  and  power 
constraints,  and  scalability,  among  others  [1],  These  are  not  the  subject  of 
this  summary.  However,  it  is  evident  that  some  fundamental  limits  are 
required  to  assess  the  optimality  of  any  system  design  with  regard  to  the 
“best  design”.  Thus,  an  information  theoretic  analysis  of  the  system  is 
required.  We  consider  a  special  case  of  this  problem. 

II.  The  h-Helper  System 


Consider  the  multisensor  system  as  shown  below. 


A  cluster  of  sensor  nodes  with  a  main 
source  (X)  and  n-helpers  (  Y,) 


Figure  1:  A  multi-node  networked  sensor  system. 

A  portion  of  a  distributed  cluster  of  sensor  nodes  (perhaps  mobile)  is 
observing  a  phenomenon  and  generating  source  data.  Algorithms  exist 
which  can  determine  which  nodes  in  the  proximity  of  the  phenomenon  need 
to  be  activated  and  which  can  remain  dormant  [1],  Once  this  boot-up 
process  is  completed,  the  node  observation  data  is  assumed  to  be  Gaussian 
(for  analytical  simplicity),  with  one  data  node  acting  as  the  main  data  source 
(e.g.  that  which  is  closest  to  the  phenomenon),  and  the  remaining  nodes 
generating  correlated  data.  The  coding  challenge  is  then  to  determine 
appropriate  codes  and  data  rates  such  that  the  gateway/data-fusion  center 
can  reproduce  the  data  from  the  main  node  using  the  remaining  nodes  as 
sources  of  partial  side  information,  subject  to  some  distortion  criteria. 


III.  A  Rate-Distortion  Bound 

Thus  for  a  main  source,  X,  and  n  correlated  sources,  Y„  with 

{Xr  YXl,  ■■■  L,}~,  being  stationary  Gaussian  memoryless  sources,  for  each 

observation  time,  /=  1, 2,  3,  ...,  we  let  the  random  (n+/)-tuplet  (xt,  }(,,•■■  f, ) 

take  values  in  Xx  Y,  x  -  Y  .  The  covariance  matrix  is  denoted  as: 

1  n 

a  x  P  xra  xa  P  xv_a  xa  r_ 

pnax(Ty  Gy  pyfOyOy 

Pxr_axar  PyjP  \ 

Then  for  an  encoding  system  using  the  T,,’s  as  n-helpers,  the  rate- 
distortion  region,  given  by: 

‘R(Dx,Dr  -Drn)  =  {(Rx,Rr  -  Rj:(Rx,Rl,  -  R„)is  admissible} 

for  a  given  set  of  rates  and  distortion  measures  is  desired.  The  encoding 
functions: (px: X"  — >  R,  =  {l,  ■  •• ,  C, }  -tpi-  Y™  — »  R,  =  {l,  ••  ■ ,  C,}  are 
such  that  the  rate  constraints  being  satisfied  arc:-—  log  C  <  R,  +5,  i=X,l, 
2,  ...n.  Extending  previous  results  [4  -6],  we  show  that  for  an  admissible 
rate  (Rx,  Rt ,  R2,  ■■  ■  Rn),  and  for  some  D,’s  >  0,  the  n-hclper  system  data 
rates  can  be  fused  to  yield  an  effective  date  rate  (with  respect  to  source  X) 
satisfying  the  following  lower  bound: 

1' 

~  U('-p2xY>+pk-22Rk)  ’ 

X  l  *=|  J 

Future  work  will  attempt  to  extend  these  results  for  non-Gaussian, 
non-stationary  sources. 
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Abstract  —  This  paper  considers  a  distributed  bi¬ 
nary  detection  system  with  n  binary  independent 
identical  sensors.  We  show  that  the  system-wise 
probability  of  error  is  a  quasi-convex  function  of  lo¬ 
cal  threshold  r  for  generalized  Gaussian  noise  and 
some  non-Gaussian  noise  distributions.  This  yields 
a  globally  optimal  and  computationally  feasible  solu¬ 
tion  technique. 

Consider  distributed  detection  of  s  £  {— m,m},  where  the 
ith  of  n  local  sensors  observes  x,  =  s  +  z,  with  i.i.d.  noise 
Zi.  The  ith  sensor  compares  Xi  to  a  threshold  r  to  compute  a 
binary  decision  m  to  be  0  when  Xi  <  r  and  1  otherwise. 

Each  binary  decision  m  is  transmitted  to  a  fusion  center, 
which  applies  a  fusion  rule  F  to  k  =  Ui  lo  produce  the 
final  decision  F(k). 

The  identical  threshold  r  in  local  sensors  generally  does 
not  result  in  an  optimum  system.  However  results  in  [1] 
showed  that  identical  local  detectors  are  asymptotically  op¬ 
timum  when  the  number  of  sensors  n  tends  to  infinity.  Even 
with  identical  local  thresholds,  the  problem  is  still  compli¬ 
cated  by  the  discontinuity  of  Bayesian  error  probability  and 
the  existence  of  multiple  local  minima.  [2]  provided  continuous 
bounds  on  the  Bayesian  error  probability,  but  the  minimiza¬ 
tion  problem  still  has  local  minima. 

In  this  paper  we  show  that  for  any  admissible  fusion  rule  F 
(i.e.  any  F  that  is  optimal  for  at  least  one  r),  the  probability  of 
error  is  a  quasi-convex  function  of  r.  The  admissible  functions 
F  are  simply  threshold  tests  of  the  form 


s 


—m 

m 


iffc<z 
if  k  >  i . 


(1) 


Hence,  the  problem  decomposes  into  a  series  of  n  quasi- 
convex  optimization  problems.  We  have  used  this  technique 
to  identify  the  optimal  (t,  F)  pairs  for  a  variety  of  cases,  and 
our  results  suggest  that  the  optimal  F  is  always  essentially 
majority  vote  for  equal  a-priori  probability  case.  According 
to  this  conjecture,  the  optimal  F  is  identified  without  compu¬ 
tation  and  only  one  quasi-convex  problem  needs  to  be  solved. 

For  brevity,  we  prove  the  quasi-convexity  for  the  Gaussian 
noise  case,  i.e.  z;  ~  A/’(0, 1).  See  [3]  for  a  more  extended  pre¬ 
sentation  showing  the  quasi-convexity  for  generalized  Gaus¬ 
sian  noise  and  some  well  known  non-Gaussian  noises. 

First  define 


Ak(r)  =  po  Qk(r +  m)Qn  k(-r-m), 

Bk(r)=p i  Q  Qk(T-m)Qn-k(-r  +  m), 

where  po  is  the  a-priori  probability  of  s  =  —m  and 
pi  =  1  -  po-  Q{t)  =  /r°°  f(x)dx,  where  /(x)  is  the  pdf  of 
normal  distribution  with  zero  mean  and  unit  variance. 


Theorem  1  Only  fusion  rules  of  the  form  Fi  in  (1)  are  ad¬ 
missible,  i.e.  are  MAP  for  some  choice  of  r. 

Proof:  For  every  r  the  MAP  F  has  the  form  F,  in  (1). 
Theorem  1  states  the  same  admissible  fusion  rule  as  ob¬ 
served  in  [4]. 

Theorem  2  For  a  fixed  admissible  fusion  rule  Fi,  probability 
of  error  is  a  quasi-convex  function  of  r. 

Proof:  Define  Pe  (r,  i)  as  the  probability  of  error  for  (r,  Fi). 
Pe  (r,  i)  can  be  expressed  as 


i-i 

Pe(r,i)=p0  +  Y,(B^)-Ak(r)).  (2) 

k= 0 

The  derivative  of  Pe{r,i)  is  a  telescoping  sum  (i.e.  has  the 
form  SI=o(/^(T)  ~  fk-i{r))  for  a  specific  /*(r))  and  can  be 
simplified  to  P'e{r ,  i)  =  a(r)(f3(T) —j(r)) ,  where  a(r )  is  always 
positive,  /3(t)  is  a  positive  and  monotonically  increasing  func¬ 
tion  and  7 (r)  is  a  positive  and  monotonically  decreasing  func¬ 
tion.  /3(— oo)  <  7(— oo )  and  j3( oo)  >  7(00).  So  P'e{r,i)  =  0  for 
only  one  r*,  for  which  /3(t*)  =  7 (r*).  For  r  <  r*,  Pe'(r,  i)  <  0 
and  r  >  t* , Pg (t , i)  >  0.  So  Pe(r,i)  is  quasi-convex.  □ 

[5]  generalizes  these  results  by  showing  quasiconvexity  in 
the  likelihood  ratio  function  for  any  distribution  on  the  i.i.d. 
observations  x,.  The  quasiconvexity  can  also  be  extended  to 
Bayesian  cost  function  SR(r,  i). 

Using  quasiconvexity  we  examined  the  SNR  required  for 
Pe  =  10“ 5  as  a  function  of  the  number  of  sensors  [3].  For 
Gaussian  noise,  we  found  that  the  number  of  binary  sen¬ 
sors  needed  for  every  SNR  is  fewer  than  twice  the  number 
of  infinite-precision  sensors.  This  can  make  the  binary  sensor 
a  better  choice  from  a  practical  or  economic  point  of  view. 
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Abstract  —  We  find  the  capacity  regions  of  large 
CDMA  systems  with  linear  receivers  and  random 
spreading  subject  to  slow  fading  (nonergodic  channel) 
and  fast  fading  (ergodic  channel). 

We  consider  the  uplink  of  a  single-cell,  synchronous  DS- 
CDMA  system  with  K  users  and  random  spreading  sequences 
of  L  chips.  The  received  signal  L-chip  column  vector  corre¬ 
sponding  to  one  symbol  interval  is  given  by 

K 

Y  =  \fzkXkSk  +  n  (1) 

where  n  ~  A/"c(0,  NqV),  x.k  is  the  complex  modulation  symbol 
of  user  k,  sy  is  the  spreading  sequence  of  user  k,  made  of  binary 
antipodal  chips  ±1  /\JT  generated  at  random  with  uniform 
probability  and  where  Zk  is  the  flat  fading  power  gain.  We 
assume  that  the  base  station  receiver  has  perfect  knowledge 
of  all  fading  gains  and  phases,  and  without  loss  of  generality, 
we  include  the  phase  rotation  of  the  &-th  channel  into  the 
modulation  symbol  Xk .  User  k  is  received  with  signal-to-noise 
ratio  (SNR)  7 *,  =  27  T*,,  where  T*  is  the  transmit  SNR.  As 
in  [3,  4],  we  consider  an  asymptotically  large  system  with  A  -4 
00  and  K/L  -4  a.  The  receiver  for  user  1  (our  reference 
user)  is  defined  by  y\  —  hf  y  followed  by  a  single-user  decoder 
operating  on  the  sequence  of  filter  outputs  yi.  The  filter  hi 
can  be  either  a  single-user  matched  filter  (SUMF)  or  a  linear 
MMSE  filter  [1].  Under  the  above  assumptions,  the  output 
SINR  d\  of  receiver  1  satisfies  [3]: 

_ 21 _ 

l+o  x  dF1{x) 

»+“  JT 

Where  A-,  (a:)  is  the  limiting  cdf  of  the  received  user  SNRs. 
In  the  following,  we  assume  that  users  are  partitioned  into 
J  classes.  Each  class  j  is  characterized  by  a  transmit  SNR 
T;.  Each  class  has  PjK  users,  where  ]T\=1pj  =  1,  and  the 
Zk  are  i.i.d.  and  normalized,  so  that  J00  xdF:(x)  —  1.  Then, 

F7(x)  =  E^iPjF^x/r,)  where  Fz(x)  is  the  fading  cdf.  Let 
user  1  belong  to  class  i.  Because  of  the  uncompensated  fading, 
user  1  SINR  is  a  random  variable  0iA.  However,  the  ratio 
£  =  0iA/(TiZi)  is  non-random  and  independent  of  i,  and  can 
be  calculated  from  (2). 

Non-ergodic  fading.  In  this  case,  we  assume  that  the 
fading  time-variations  are  very  slow  so  that  the  output  SINR 
is  random  but  constant  over  one  code  word.  Outage  proba¬ 
bility  for  users  of  class  i  is  given  by  P0„t,7  =  P{0,.k  <  0,)  — 

Ft  where  j3,  is  a  SINR  threshold  that  depends  on  the 

coding  scheme  of  class  i.  Assuming  Gaussian  codes  and  min¬ 
imum  distance  decoding  at  the  output  of  the  receiving  filter, 
we  let  0i  =  2"'  -  1. 


SUMF 

MMSE 


Let  f  =  (f  1, . . . ,  f  j)  be  a  vector  of  input  SNR  constraints, 
e  =  (ci , . . . ,  tj)  be  a  vector  of  target  outage  probabilities,  and 
R  =  (Ri, . . . ,  Rj)  be  a  vector  of  coding  rates.  We  find  the 
outage  capacity,  i.e.,  the  set  7Z  C  of  rate  vectors  R  that 
can  be  assigned  to  the  J  classes  such  that,  for  all  i  —  1, . . . ,  J, 
Pout,,  <  e.  and  F,  <  f,  By  letting 

2r‘  —  1 

_  _ _ _ _ _  /Q\ 

sup{x  G  K+  :  Fz(x)  -  c,} 

w'e  rewrite  the  outage  constraint  as  T;£  >  pi-  For  maximum 
R ,  this  must  hold  with  equality,  which  implies  that  Tt/p,  =  k 
is  a  constant  independent  of  i.  Solving  for  k  and  imposing  the 
input  constraints,  we  obtain  the  capacity  inequality 

aJ2PjBj<  mm  j1 -a}  (4) 

j  =  i 


where  the  effective  bandwidth  Bj  is  given  by 

f  Pj  SUMF 

Bj  =  ■/  roo  (  )  MMSE  (5) 

Ergodic  fading.  In  this  section  we  assume  that  the  fading 
is  sufficiently  feist  the  channel  can  be  considered  information 
stable  [2].  Assuming  that  all  users  generate  their  code  book 
according  to  a  complex  circularly-symmetric  Gaussian  pdf, 
users  in  class  i  can  communicate  reliably  at  rate 

roo 

Ri  =  /  log2(l  +  x$,r,)dFz{x) 

Jo 

We  find  the  set  of  rates  R  =  (Ri, . . . ,  Rj)  achievable  with 
input  constraints  T  <  f.  Since  the  function  f(y)  = 
fox  log2(l  +  xy)dFz(x)  is  monotonically  increasing,  we  define 
17  =  f~1(Ri),  then,  r,£  >  i/,  and  from  an  argument  similar  to 
above  we  obtain  a  capacity  inequality  of  the  same  form  of  (4), 
with  the  substitution  Vj  -»  pj.  It  follows  that,  the  effective 
bandwidth  Bj  in  the  ergodic  case  has  the  same  form  of  (5), 
with  Uj  -4  pj. 
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Abstract  —  A  family  of  multiuser  detectors  is  ana¬ 
lyzed  which  require  neither  matrix  inversions  nor  oth¬ 
er  operations  with  significant  complexity.  The  time 
complexity  per  bit  of  most  of  them  is  independent  of 
the  number  of  users.  Nevertheless,  their  spectral  ef¬ 
ficiency  for  random  spreading  sequences  is  shown  to 
be  not  far  behind  that  of  linear  MMSE  detection. 


I.  Introduction 

Recently,  the  performances  of  well-known  linear  and  nonlinear 
multiuser  detectors  in  random  environments  were  analyzed  in 
[1,  2,  3]  revealing  important  gains  over  the  spectral  efficiency 
of  the  single-user  matched  filter.  The  price  for  those  improve¬ 
ments  is  receiver  complexity. 

An  important  class  of  multiuser  receivers  with  lower  com¬ 
plexity  is  based  on  the  idea  of  approximate  decorrelation  (AD) 
[4]  (a  generalization  to  approx.  MMSE  detectors  is  straightfor¬ 
ward):  Matrix  inversion  in  approximated  via  Lth  order  poly¬ 
nomial  expansion  M~x  «  Y2e=o  wlMl,  see  e.g.  [5],  with  prop¬ 
erly  chosen  weights  w/_. 

II.  Main  Results 

Let  y  =  S,1(Sx  +  n)  denote  the  vector  notation  of  a  syn¬ 
chronous  K  user  Gaussian  CDMA  channel  with  x,  y  denoting 
the  transmitted  and  received  symbols,  respectively,  n  the  com¬ 
plex  additive  white  Gaussian  noise  of  variance  a2  and  S  the 
L  x  K  matrix  of  complex  signature  sequences.  In  this  sum¬ 
mary,  we  restrict  attention  to  equal  received  powers  and  we 
assume  that  the  diagonal  elements  of  the  matrix  R  =  SHS 
equal  unity. 

Theorem  1  Let  K,N  — >  oo,  but  0  <  /3  =  ^  <  oo  and  the 
random,  components  of  S  be  independent  with  finite  variance. 
Then,  the  signal-to-interference-and-noise  ratio  at  the  output 
d  =  Ty  of  any  linear  equalizer  described  by  a  matrix  T  = 
E«=o  wt(/3,  c)Re  converges  almost  surely  to  a  deterministic 
scalar  for  arbitrary  weight  functions  wt{fl,cr),  0  <  l  <  L,  and 
arbitrary  order  L. 


Theorem  1  allows  to  give  explicit  expressions  for  the  SIR 
of  Lth  order  approximation  to  the  MMSE  multiuser  detector. 
The  results  for  L  =  1,  2, 3  axe  the  following: 


maxSIRi  -A 

W{ 

max  SIR-2  — >■ 

■Wi 

max5/i?3  — y 

Wi 


1+0+0 _  1  —  20+0 _ , QTD 

'  02+p3+a 2(i-/34£2) 
l+0+02+<T2(U-20)+<?‘i 
p*+'T*(l+‘20^p-*)+<x'l(2W0)+vS 

i+j3+g2+/33+g2(aHWg2)+o-‘1(at3^)+cr6 

/34+<T2Cl+2^4e/J2-H/33)+<T4(at6^4€/32)+CT6(3+4/3)+^8 


The  0th  order  approx,  is  equivalent  to  the  conventional 
matched  filter.  The  first  order  approximation  (cf.  [4,  Prob. 
5.28.d]  is  better  than  the  approximate  decorrelator  analyzed 

1This  work  was  supported  by  the  German  Academic  Exchange 
Service  (DAAD)  under  grant  332  4  00  510  and  by  the  U.  S.  Army 
Research  Office  under  Grant  ARO  DAAH04-96-1-0379. 


in  [4,  p.  281],  where  the  weights  were  based  on  a  Taylor  expan¬ 
sion  and  not  optimized  with  respect  to  the  maximum  achiev¬ 
able  SIR.  The  optimum  weights  can  be  expressed  as  Lth  order 
polynomials  in  ft  and  <r2  and  calculated,  recursively.  Thus, 
their  computation  is  very  simple  in  real-time  applications. 

For  a  fair  comparison  to  the  performances  of  the  decorrela¬ 
tor,  the  MMSE  detector,  and  the  matched  filter,  the  spec¬ 
tral  efficiency  T  —  fiC  =  j3  log2(l  +  SIR)  and  power  ef¬ 
ficiency  ^  =  cr2C  are  calculated  as  in  [1,  2].  Averaging 
capacity  results  over  the  load,  our  results  are  extended  to 
re-encoded  successive  cancellation  (SC)  receivers  [1,  2]  via 
Tsc(/3)  =  fg  log2(l  +  SIR(/3'))d/3' .  Fig.  1  shows  the  spectral 
efficiency  for  fixed  Eb/No  as  a  function  of  the  load  8  =  K/N. 
At  low  fi  simple  linear  receivers  noticeably  outperform  the 
single-user  matched  filter.  At  high  (8  simple  nonlinear  receivers 
noticeably  outperform  the  exact  MMSE  receiver. 


Fig.  1:  Spectral  efficiency  vs.  system  load  for  several  multiuser  de¬ 
tectors  and  fixed  101og10(£'b/Ao)  =  10  dB. 


III.  Conclusion 

Increasing  spectral  efficiency  by  multiuser  detection  need  not 

involve  significant  increase  in  receiver  complexity  even  with 

long  spreading  sequences. 
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Abstract  —  In  this  work  we  present  an  approach  for 
evaluating  the  spectral  efficiency  of  a  direct-sequence 
spread-spectrum  system  based  on  the  channel  cutoff 
rate.  The  spectral  efficiency  is  evaluated  indepen¬ 
dently  of  specific  forward  error  control  codes,  and  thus 
can  provide  general  insight  into  system  performance 
and  parameter  tradeoffs. 


I.  Introduction 

For  equal-power  users  with  known  power  the  effective  noise 
spectral  density  in  a  direct-sequence  (DS)  spread-spectrum 
system  is  given  by  [1 ,  2] 


/o  =  No  +  (N  -  1  )RbEb/WT  •  (1) 


where  the  first  term,  No,  represents  the  thermal  noise  while 
the  second  term  represents  the  multiple-access  interference  in 
terms  of  the  bit  rate  per  user,  Rb ■  We  consider  the  single-user 
detection  case,  with  a  large  number  of  users.  To  incorporate 
the  effects  of  fading,  we  assume  the  energy  is  unknown  and 
that  the  amplitude  of  the  received  signal  from  each  user  is 
subject  to  Rician-distributed  fading,  |z|,  and  further  impose  a 
conservation-of-energy  constraint  so  that  £{|S(f)|2}  =  1. 

It  follows  that  the  signal-to-interference  ratio  can  be  writ¬ 
ten  as 

p  it  _  _ Eb/Np _  .  . 

6/  0  1  +  (N -l)(Rb/WT)(Eb/N0)  ■ 

Defining  the  spectral  efficiency  as  tjat  =  EE±  ;  bits/sec/ H z 
and  the  total  carrier  power  as  Ct  =  NRbEb  as  in  [3],  combin¬ 
ing  the  three  equations  and  normalizing  by  the  total  thermal 
noise  yields  the  spectral  efficiency  in  terms  of  the  carrier-to- 
noise  power  ratio, 


T)N 


_ CtKNqWt ) _ 

(Eb/I0)[l  +  (^)(Ct/(N0Wt))] 


;  bits /sec/ Hz  . 


(3) 


We  are  particularly  interested  in  the  limiting  spectral  effi¬ 
ciency  in  terms  of  increasing  numbers  of  users,  i.e.,  JV  — ►  oo. 
A  general  method  for  evaluating  the  spectral  efficiency  is  pre¬ 
sented  in  the  following  section. 


II.  Cutoff  Rate  Evaluation  of  Spectral 
Efficiency 


Fig.  1:  The  limiting  spectral  efficiency  for  M  —  4  and  selected 
rates. 


fM(REb/No,(2)  =  (M2~r  -1)M .  Given  Eb/Np  for  a  specific 
channel  and  modulation  scheme,  the  value  of  Eb/I o  in  (3)  is 
then  taken  as  Eb/No- 

In  Fig.  1  the  limiting  spectral  efficiency  as  N  — >  oo  for 
DS/MPSK  is  shown  as  a  function  of  the  carrier-to-noise  power 
ratio,  Ct /(NoWt),  for  selected  channel  coding  rates.  Observe 
that  for  all  rates  there  exists  a  value  of  the  carrier-to-noise 
power  ratio,  Ct/(NoWt)  «  10,  above  which  increasing  the 
ratio  does  not  result  in  a  significant  gain  in  the  spectral  ef¬ 
ficiency.  It  is  also  readily  observable  that  the  spectral  effi¬ 
ciency  is  monotonically  increasing  with  increase  in  error  con¬ 
trol  coding,  i.e.,  decreasing  R.  However,  the  spectral  efficiency 
gains  reach  a  point  of  diminishing  returns  at  approximately 
R  =  0.25.  Similar  results  are  shown  for  DS/QAM. 

III.  Summary 

It  is  shown  that  in  general  use  of  some  FEC  coding  increases 
the  spectral  efficiency  of  the  system.  However,  regardless  of 
code  rate,  performance  is  optimized  for  M  =  4;  no  significant 
performance  gains  are  realized  for  larger  signaling  alphabet 
sizes. 


The  cutoff  rate  for  both  MPSK  and  QAM  can  be  written 
in  the  form 

M 

80  =  l0El  i  +  JTmrKJn^)  '  l'“,eu ■  '  w 

where  M  is  the  constellation  size  and  R  is  the  coding  rate. 
Setting  R  =  Ro  in  (4)  leads  to  the  requirement  that  the  value 
of  Eb/No  required  to  operate  at  this  rate  be  the  solution  of 
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Abstract  —  The  performance  of  data-aided  chan¬ 
nel  estimation  algorithms  for  CDMA  systems  is  anal¬ 
ysed.  We  compare  frame-synchronous  and  frame- 
asynchronous  LMMSE  channel  estimators  in  large 
systems  with  random  spreading. 

I.  Signal  Model 

Our  starting  point  is  the  equation  for  the  chip-matched 
filter  output  vector  at  time  m 

K 

y(m)  -^2ak(m)bk(m)sk(m)  +  n(m) 
k= 1 


that  the  data  of  the  interfering  users  is  known  to  the  chan¬ 
nel  estimator  so  that  this  algorithm  is  applicable  in  frame- 
asynchronous  scenario  where  the  training  data  of  the  users 
does  not  line  up. 

Result  1  The  MSE  for  any  user  converges  almost  surely  as 
N  — t  oo  to  the  nonrandom  £2  =  p/(l  +  ppc),  where 

A  1-Q  1  ■  [(l~a)2  .  1  +  a  ,  ^  11/2 

2  2  pT  4(7?  2 pTaf  4 p? 

The  effect  of  the  estimation  window  length  is  that  the  back¬ 
ground  noise  power  and  the  interference  power  are  reduced  by 
t  relative  to  the  r  =  1  case. 


where  k  €  {1, . . .  ,  K}  indexes  the  multiple  users,  ak(m)  is  the 
channel  gain  for  user  k  over  symbol  period  m,  bk(m)  is  the 
M-ary  PSK  data  symbol  of  user  k  over  period  m,  sk(m)  is  the 
signature  sequence  of  user  k  over  symbol  period  m  and  n(m) 
is  a  circularly  symmetric  complex  white  Gaussian  noise  with 
E[n(m)]  =  0  and  E[n(m)rtH(m)]  =  a2 1. 

The  channel  gain  process  for  each  user  is  a  circularly 
symmetric  complex  Gaussian  random  process  and  the  pro¬ 
cesses  for  each  user  are  independent  with  E[a/t(m)]  =  0  and 
E[afc(m)a£(m)]  =  p.  We  assume  that  the  channel  is  constant 
over  time  spans  corresponding  to  the  frame  or  block  dura¬ 
tion  so  that  within  a  particular  frame  of  data  we  can  drop 
the  time  dependence.  We  also  assume  the  channel  estima¬ 
tion  starts  from  scratch  at  the  beginning  of  every  frame  which 
means  that  our  a  priori  information  is  simply  E[afc(m)]  =  0 
and  E[a*,(m)a£(m)]  =  p. 

The  signature  sequence  sk(m)  is  assumed  to  be  an  A  di¬ 
mensional  column  vector  with  independent  and  identically  dis¬ 
tributed  elements  each  being  a  circularly  symmetric  complex 
Gaussian  random  variable  with  zero  mean  and  variance  1/iV. 
The  random  sequences  are  independent  across  users  and  across 
symbols. 

II.  Channel  Estimation 

Suppose  throughout  that  we  are  interested  in  estimating 
the  channel  of  user  one.  If  we  refer  to  the  channel  we  are 
referring  to  the  channel  of  user  one.  We  assume  we  have  r 
pilot  symbols  in  every  frame  for  channel  estimation  and  let 
pT  =  P/t,  <jt  =  c/a/t  and  aT  =  a/r.  The  proofs  of  the 
results  are  omitted  however  Result  2  can  be  found  in  [1]  and 
Result  1  follows  using  the  same  techniques  (first  applied  in  [2] 
for  the  data  estimation  problem). 

Frame- Asynchronous  LMMSE 

For  this  estimator  we  perform  LMMSE  estimation  of  the  chan¬ 
nel  based  on  the  received  signal  over  the  estimation  window, 
along  with  the  training  data  of  user  one.  We  do  not  assume 


Frame-Synchronous  LMMSE 

In  this  case  we  assume  we  know  the  data  of  all  users  over 
the  estimation  window  and  perform  LMMSE  estimation  con¬ 
ditioned  on  this  information.  We  thus  require  that  the  train¬ 
ing  interval  or  estimation  window  of  all  users  is  aligned.  In 
this  case  the  resulting  channel  estimate  is  the  MMSE  estimate 
since  the  problem  is  one  of  Gaussian  estimation. 


Result  2  The  MSE  for  any  user  converges  almost  surely  as 
N  — k  oo  to  the  nonrandom  £2  =  p/(l  +  p0c)  where 


Pc  = 


1  —  aT 

2(7? 


(1  -Qr)2  1  +Qr 

2pcr  % 


For  this  receiver  we  see  that,  along  with  the  background 
noise  being  reduced  by  r,  the  effective  spreading  gain  is  in¬ 
creased  by  r.  The  alignment  of  the  pilot  symbols  of  all 
users  means  that  we  can  form  effective  spreading  sequences 
of  interferes  by  piecing  together  the  (modulated)  spread¬ 
ing  sequences  from  all  the  pilot  symbols.  This  property  can 
lead  to  very  large  performance  improvements  over  the  frame- 
asynchronous  LMMSE  estimator. 


III.  Conclusions 

In  this  work,  we  analyse  the  performance  of  multiuser  chan¬ 
nel  estimation  algorithms  for  CDMA  systems.  One  point  that 
is  evident,  is  that  there  are  significant  gains  from  knowing  the 
data  of  all  users  over  the  estimation  window.  The  results  we 
have  presented  can  be  extended  to  frequency-selective  fading 
and  to  handle  non-equal  average  powers. 
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Abstract  —  We  consider  the  problem  of  one¬ 
dimensional  parameter  transmission  over  the  Poisson 
channel  when  input  signal  (intensity)  obeys  peak  en¬ 
ergy  constraint.  We  show  that  it  is  possible  to  choose 
input  signals  and  estimator  in  such  a  way  that  the 
mean-square  error  (or,  more  generally,  a-mean  er¬ 
ror  for  loss  function  |z|°)  of  parameter  transmission 
will  decrease  exponentially  with  transmission  time 
T  -4  oo,  and  we  find  the  best  possible  exponent,  if 
a  >  ao  =  (1  +  V5)/2  w  1.618. 

I.  Statement  of  the  problem 

Let  0  =  [0, 1]  be  the  parameter  (to  be  transmitted)  set.  We 
assume  that  an  input  signal  (intensity  function)  S(9 ,  t)  of  Pois¬ 
son  channel  satisfyes  only  the  peak  energy  constraint: 

0  <S{8,t)<A  for  any  6  £  [0, 1] ,  0  <  t  <  T ,  (1) 


II.  Main  result 

The  following  theorem  presents  the  main  result  of  the  pa¬ 
per. 

Theorem  .  If  a  >  a o  =  (1  +  \fS)f2  ~  1.618  then 


e(a)  = 


a 

4(1  +  a) 


(3) 


In  other  words,  if  a  >  ao  then  for  T  — 1  oo 


inf  inf  sup  Eg  \9t  —  #1°  =  exp 
s(.,.)  eT  eg[o,i] 


aAT(l+o(l))\ 
4(l  +  o)  /’ 


where  infs(.,.)  is  taken  over  all  signals  S(., .)  satisfying  con¬ 
straint  (1). 

Clearly,  e(2)  =  1/6  determines  the  best  exponential  rate 
for  the  mean-square  error. 


where  A  >  0  is  some  given  constant. 

Thus,  if  9o  is  the  true  value  of  parameter  9  then  the  ob¬ 
servation  process  at  the  channel  output  X(t),  0  <  t  <  T, 
is  a  random  process  with  independent  increments  such  that 
X(0)  =  0  and  for  any  0  <  t\  <  f2  <  T 

Pr{Y(<2)  -  X(h)  =  j}  =  ,  j  =  0, 1, . . .  , 

J- 


Introduce  function  d(a,T),  giving  the  minimal  possible  a- 
mean  error  for  the  best  estimator  8t  and  the  best  chosen 
signals  S(8,  t)  when  parameter  8  takes  values  from  the  set 
©  =  [0,1]: 

d{a,T)  =  inf  inf  sup  E@  l#r  —  #1°  ,  a  >  0  , 

s(.,.)  eT  eg© 

where  infs(.  )  is  taken  over  all  signals  S(., .)  satisfying  con¬ 
straint  (1). 

We  are  interested  in  asymptotic  behavior  of  function 
d(a,T)  for  large  T.  Since  it  decreases  exponentially  when 
T  — k  oo,  we  introduce  also  function  e(a),  giving  the  best  pos¬ 
sible  exponent  for  the  a-mean  error 

e(a)  =  ^lim^  {--^  In  d(a,T)}  ,  a>0.  (2) 


'This  work  was  supported  by  Grant  N  98-01-04108  from  the 
Russian  Fund  for  Fundamental  Research. 


Remarks.  1)  Function  e(a)  is  very  similar  to  the  reliability 
function  E(R)  of  Poisson  channel  [5],  [3].  Using  function  E(R) 
we  get  the  lower  bound  for  function  e(a).  On  the  other  hand, 
knowning  function  e(a)  we  can  get  the  exact  upper  bound  for 
function  E(R)  that  is  the  most  difficult  part  in  finding  the 
function  E(R). 

2)  In  the  case  of  White  Gaussian  noise  channel  a  similar 
problem  was  solved  in  [1,  2].  Moreover,  a  number  of  opti¬ 
mal  results  known  for  White  Gaussian  noise  channel  has  been 
obtained  recently  for  Poisson  channel  as  well  [5,  3].  In  that 
respect,  the  paper  also  extends  results  of  [1,  2]  to  the  Poisson 
channel.  A  common  feature  of  all  papers  [5,  3]  and  this  one  is 
that  the  Poisson  channel  turns  out  to  be  a  simpler  than  the 
Gaussian  one. 
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Abstract  —  In  this  paper,  a  new  recursive  algo¬ 
rithm  for  ARMA  modeling  of  uniformly  sampled  sig¬ 
nals  with  missing  observations  is  proposed.  This  algo¬ 
rithm  enables  real-time  processing  and  may  be  used 
for  time  and  frequency  domain  reconstruction. 

This  paper  addresses  the  problem  of  statistical  inference 
concerning  time  series  from  missing  data.  Several  restora¬ 
tion  methods,  including  parametric  estimation  methods  in  the 
presence  of  incomplete  data,  can  be  found  in  [1]  [2]  [3],  All 
those  methods  deal  only  with  stationary  signals  while  the  pro¬ 
posed  method  is  also  suited  to  non-stationary  ones. 

An  ARMA  adaptive  predictor  is  used.  It  has  been  adapted 
to  the  non-uniform  sampling  context  by  the  way  of  replacing 
each  missing  value  by  its  estimate.  So,  due  to  missing  obser¬ 
vations  a  non-linear  optimization  criterion  is  required  in  order 
to  estimate  the  model  parameters.  The  optimum  is  reached 
by  means  of  an  LMS-like  algorithm  adapted  to  this  sampling 
context. 

A  low-pass  ARMA  (2,2)  signal  is  generated  as  the  output 
of  an  elliptic  filter  in  order  to  test  the  performances  of  the 
proposed  algorithm  for  both  AR  and  MA  parts.  Figure  1 
shows  the  reconstructed  signal  for  only  one  realization  of  the 
sampling  process  in  the  case  where  20%  of  the  samples  are 
lost. 


Fig.  1:  Original  ( — )  and  reconstructed  (...)  signal,  missing 
samples  (++) 


Figure  2  shows  a  good  agreement  between  original  and  es¬ 
timated  PSDs  for  different  values  of  probability  p.  The  pro¬ 
posed  method  leads  to  far  better  performances  for  the  spectral 


estimator  than  a  classical  off-line  method  [4]  for  ARMA  iden¬ 
tification  of  uniformly  sampled  signals,  even  in  the  case  where 
20%  of  the  samples  are  lost. 


Fig.  2:  Original  PSD  ( — ),  estimated  PSD:  classical  off-line 
method  (-  -),  proposed  method:  periodic  sampling  p  =  1  (-.-), 
missing  samples  p  =  0.8  (...) 


Data  compression  may  be  achieved  by  means  of  non¬ 
periodic  transmission  [5].  The  proposed  algorithm  is  an  an¬ 
swer  to  the  need  of  an  efficient  reconstruction  algorithm  in  the 
receiver  in  the  case  of  ARMA  modeled  signals  (for  instance 
speech  coding). 
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Abstract  —  We  use  unreliable  system  replicas  and 
unreliable  voters  to  construct  redundant  dynamic  sys¬ 
tems  that  tolerate  transient  failures  in  their  state 
transition  and  error  correcting  mechanisms.  Using 
low  density  parity  check  (LDPC)  codes,  we  develop  a 
fault-tolerant  scheme  that  efficiently  protects  linear  fi¬ 
nite  state  machines  (LFSM’s)  with  identical  dynamics 
but  distinct  input  sequences  and  states.  The  scheme 
achieves  a  probability  of  failure  that  remains  below 
any  given  bound  for  any  pre-specified  (finite)  time- 
interval  using  a  constant  amount  of  hardware  (XOR 
gates  and  voters)  per  LFSM. 

I.  Introduction 

A  dynamic  system  evolves  according  to  an  internal 
state  that  influences  its  future  states/outputs.  The  effect 
of  a  transient  failure  in  the  state  transition  mechanism 
may  last  over  several  time  steps  (even  though  the  cause 
does  not  persist).  A  dynamic  system  (e.g.,  a  finite-state 
machine)  in  which  the  probability  of  making  a  transition 
to  an  incorrect  next  state  is  ps  and  is  independent  be¬ 
tween  different  time  steps,  follows  the  correct  state  tra¬ 
jectory  for  L  time  steps  with  probability  (1  -  ps)L .  A 
common  solution  is  to  use  modular  redundancy  with  feed¬ 
back:  a  voter  feeds  back  to  all  systems  the  state  agreed 
upon  by  the  majority  of  them.  If  the  voter  fails  with 
probability  pv,  this  approach  will  not  work:  after  L  time 
steps,  the  probability  that  the  system  has  followed  the 
correct  state  trajectory  is  at  best  (1  -  pv)L- 

II.  Fault-Tolerant  Scheme 

Consider  a  variant  of  modular  redundancy  that  uses 
n  system  replicas  (initialized  at  the  same  state  and  sup¬ 
plied  with  the  same  inputs)  and  n  voters,  each  of  which 
receives  “ballots”  from  all  n  systems  and  feeds  back  a 
correction  to  only  one  of  the  systems.  Since  a  fault-free 
voter  recovers  the  correct  state  of  the  underlying  dynamic 
system  as  long  as  more  than  half  of  the  n  systems  are  in 
the  correct  state,  our  (conservative)  goal  is  to  ensure  that, 
with  high  probability,  the  fault-tolerant  implementation 
has  no  overall  failure  (i.e.,  it  operates  with  at  least  [" 
systems  in  the  correct  state  at  any  given  time  step). 
Theorem,  [1]:  Suppose  each  system  takes  a  transition  to 
an  incorrect  state  with  probability  ps  and  each  voter  feeds 
back  an  incorrect  state  with  probability  pv  (independently 
between  systems,  voters  and  time  steps).  The  probability 
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of  an  overall  failure  at  or  before  time  step  L  is  bounded 
above  by  L£"=l „/2j  (  •  )p‘(!  “  P)n~' >  where  P  =  Pv  + 
(1  -  pv)ps-  This  probability  goes  down  exponentially  with 
the  number  of  systems  n  if  and  only  if  p  < 

An  LFSM  is  an  FSM  with  state  evolution 
q5[<  +  1]  =  Aqs[f]  ®  b x[t\,  where  qs[i]  is  the  d- 
dimensional  state  vector,  x[t]  is  the  input,  and  A,  b  are 
constant  matrices  of  appropriate  dimensions  (all  vectors 
and  matrices  have  entries  from  GF( 2)).  If  we  take  k 
such  LFSM’s  and  let  them  run  in  parallel  (each  with  dif¬ 
ferent  initial  states  and  different  input  streams),  we  get 

[  qi[*  +  i]  •••  q*[*  +  i]  ]  = 

=  Ac  [  qi[<]  q/t  [t]  ]  ®  b  [  Xi  [f]  •••  xk[t]  ] 

If  we  post-multiply  both  sides  of  the  above  equation 
by  Gt  (where  G  is  an  n  x  k  encoding  matrix  of  a 
linear  code),  we  get  the  following  n  encoded  parallel 
instantiations 

[G[i+1]  «n[t+l]]  = 

=  Ac  [  [t]  •••  £n[t]  ]  ©  b  ([  aci  [t]  x*[<]  ]  Gr) 

' - v - 

e(xi[t],X2[f],---,  x  fc  [<]) 

We  have  n  LFSM’s  performing  k  different  encoded  in¬ 
stantiations  of  the  given  LFSM.  We  employ  LDPC  codes 
(with  K  “l’s”  in  each  row  and  J  “l’s”  in  each  column  of 
their  parity  check  matrix)  and  use  the  approach  in  [2]  to 
perform  error-correction  (each  bit  can  be  corrected  via  a 
mechanism  that  uses  unreliable  XOR  gates  and  unreliable 
voters). 

Theorem,  [1]:  Assume  that  the  2-input  XOR  gates  fail 
with  probability  pj,  and  the  (J-l)-bit  voters  fail  with  prob¬ 
ability  pv .  Let  J  be  a  fixed  even  integer  greater  than  4, 
let  K  be  an  integer  greater  than  J,  and  let  p  be  such  that 

P  >  (  Jj~/2  )  ~  !)(2P  +  3Px)]J/2  +  Pv  +  Px-  Then 

there  exists  a  sequence  of  ( n ,  k)  LDPC  codes  such  that  the 
probability  of  an  overall  failure  at  or  before  time  step  L 
is  bounded  above  by  LdCk~ &  where 
n  log{(J-l)(K-l)(  j  [T  —  \  )[(*'-U(2p+3Pl)]J/2-1} 

P  -  2  log[(  J  —  1 )( A"  —  1 )]  6 

f  n-(/3+3) 

C  =  (l- j/kA  [277  “  27(77— Tj J 

The  code  redundancy  is  j  <  l_)/K  and  the  hardware 
used  per  LFSM  (including  the  error- correcting  mecha¬ 
nism)  is  bounded  above  by  a  constant. 
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Abstract  —  It  is  known  that  in  the  absence  of  dis¬ 
tortion,  the  minimum  average  sampling  density  for  a 
multiband  signal  is  given  by  its  spectral  occupancy  [1], 
Furthermore,  there  exist  nonuniform  sampling  pat¬ 
terns  of  the  same  average  sampling  density  such  that 
reconstruction  is  feasible  even  if  the  actual  spectral 
support  of  the  multiband  signal  is  unknown  [2].  This 
is  called  spectrum-blind  nonuniform  sampling.  How¬ 
ever,  if  the  samples  are  distorted,  an  increased  sam¬ 
pling  density  may  lead  to  superior  reconstruction. 

Suppose  that  a  fidelity  criterion  is  imposed  on  the  recon¬ 
struction.  To  satisfy  this,  it  is  necessary  to  sample  at  an  in¬ 
creased  density.  In  this  paper,  we  consider  additive  noise  dis¬ 
tortion  of  the  samples,  and  the  fidelity  criterion  is  the  prob¬ 
ability  that  the  spectral  support  is  correctly  reconstructed. 
In  [3],  we  consider  samples  distorted  by  quantization,  with  a 
mean-square  reconstruction  error  fidelity  criterion. 

I.  Nonuniform  Sampling 

Consider  a  complex-valued  length-N  sequence  x  €  CN 
with  discrete  Fourier  transform  (DFT)  X  £  CN ,  where 
X(m)  =  1  /\fN  ^^_0x(n)e_-,  w'2,rnm.  Let  a:  be  a  multi¬ 
band  sequence  of  spectral  occupancy  q/N,  i.e.  let  X  have  (at 
most)  q  non-zero  components  in  arbitrary  locations,  indexed 
by  K_=  {fci, . . . ,  kq},  where  ki  £  {0, ...  N  —  1}.  The  spectral 
occupancy  for  this  vector  is  0  =  q/N.  Define  the  vector  Xc 
containing  only  p  of  the  N  components  of  x,  at  locations  in¬ 
dexed  by  c  =  {ci, . . . ,  cp}.  These  are  the  nonuniform  samples, 
with  average  sampling  density  p  =  p/N.  In  matrix  notation, 
we  can  write  Xc  =  A£,kS.  Here,  S  contains  the  q  non-zero 
components  of  X,  and  Ac,i<  is  the  submatrix  of  the  inverse 
DFT  matrix  that  is  obtained  by  only  retaining  the  rows  with 
indices  in  c  and  the  columns  with  indices  in  K.  We  consider 
the  case  of  distorted  samples  yc=Xg_+z  —  Ac.kS  +  z,  where 
z  ~  7VC(0,  cr2/)  is  (complex)  white  Gaussian  noise. 

II.  Necessary  Sampling  Density 

Let  the  location  of  the  q  nonzero  components  of  X  be 
distributed  uniformly  over  all  possibilities,  and  let  their 
(complex)  values  be  distributed  as  circularly  normal,  S  ~ 
Afc(0,<TsI).  We  define  the  signal-to-noise  ratio  (SNR)  /3  = 
<t2s/<j2  .  It  can  be  shown  [4]  that  for  z  —  0,  there  exist  sam¬ 
pling  patterns  with  sampling  density  p  =  ft  +  1/N  allowing 
w.p.l,  perfect  reconstruction  of  x  from  yc- 

We  derive  a  necessary  condition  for  the  optimal  sampling 
density  for  z  ^  0.  It  follows  from  considering  mutual  informa¬ 
tions.  We  start  by  noting  that  by  the  data  processing  lemma, 

1This  work  was  supported  in  part  by  by  NSF  Grant  No.  MIP 
97-07633  and  DARPA  Contract  F49620-98-1-0498,  administered  by 
AFOSR. 
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I{xc\  yc)  >  I((S,  K);  yc)  =  I(K;  Vc)  + 1 (S;  yc\K),  which  yields 
max / (x£;  y£)  >  max  {l(K;  yg)  +  /(S;  yc\K)}  >  (1) 

{Ac,k} 

where  first,  the  max  is  taken  on  both  sides  over  all  sets 
{A,  ,k}  of  matrices  satisfying  (Ac.fcAf*)..  =  fi  (which  pre¬ 
serves  E\xc(i)\2  =  fieri);  then,  on  the  LHS,  the  max  is  taken 
over  all  distributions  of  Xc(i)  for  which  E\xc(i)\2  =  flcr|  as 
for  the  true  Xc_{i).  The  term  on  the  left  in  Eqn.  (1)  is  sim¬ 
ply  the  capacity  of  a  (complex)  additive  white  Gaussian  noise 
(AWGN)  channel  with  input  power  constraint  fieri  and  addi¬ 
tive  noise  variance  er2,  thus  max/(xc;  y£)  =  p  log2  (1  +  fl/3) . 

Next,  consider  I(K;yc)  in  Eqn.  (1).  This  is  the  mutual 
information  across  the  digital  channel  from  K  to  yc-  A  lower 
bound  on  the  mutual  information  follows  from  Fano’s  inequal¬ 
ity,  /(£;  ycj  >  H{K)  -  Hb(Pe)  -  Pc  log2  ((f)  - 1) . 

Last,  consider  1{S\  yc\K)  m  Eqn.  (1).  It  is  the  mutual  in¬ 
formation  across  the  channel  between  5  and  y£.  This  is  also  a 
Gaussian  channel,  but  its  input  is  not  iid.  The  achieved  mutu¬ 
al  information  is  found  by  averaging  over  all  k  as  7(5;  yc\K_)  = 
Ek_  log2  det  (lq  +  j3A^KAc,K_)  ■  For  each  k,  the  maximum  over 
Ac,k  subject  to  the  aforementioned  constraint  is  achieved  (by 
the  geometric-arithmetic  mean  inequality)  by  A&,k  that  has  or¬ 
thogonal  columns,  yielding  7(5;  2/c|i0  =  <?log2  (1  +  /3p).  This 
proves  the  following: 

Theorem  (Necessary  Condition).  The  optimal  sam¬ 
pling  density  p  =  p/N  has  to  satisfy 

plog2  (1  +  m  >  i  [log2(f )  -  Hb(Pe)  -  Pc  log2  ((f)  -  1)] 

+  fl  log  2  (1  +  Pp) 

Letting  N  oo  in  the  theorem,  we  obtain 

p  log 2  (1  +  /3fl)  >  fl  log2  (1  +  Pp)  +  (1  -  Pe)Hb(fl),  (2) 

which  is  sharp  in  the  limit  P  — >  oo,  because  it  reduces  to 
p  >  f2.  For  finite  SNR  P,  p  >  fl,  with  the  excess  density 
given  by  (2). 
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I.  Introduction 

Calderbank  and  McGuire  discovered  2  remarkable  Z^-linear 
codes  [2], [3],  The  binary  Gray  images  of  these  codes  have 
respective  parameters  (64, 237, 12)  and  (64, 232 , 14)  and  thus 
have  2  (resp.  4)  times  as  many  code  words  as  the  best  known 
linear  codes  of  the  same  length  and  minimum  distance. 

A  decoding  algorithm  for  the  5-error-correcting  code  is 
given  in  [4].  The  approach  there  (following  the  ideas  of  the 
pioneers  of  Z^-codes)  is  to  split  the  study  into  several  cases 
according  to  the  Lee  type  of  the  error  vector.  Then  the  Ga¬ 
lois  ring  algebra  is  used  to  decide,  whether  the  syndromes  are 
compatible  with  an  error  vector  of  the  prescribed  type.  Unfor¬ 
tunately,  it  seems  to  be  very  difficult  to  apply  this  method  to 
the  case  of  the  six-error-correcting  code.  A  different  approach 
(presented  as  an  alternative  in  [4])  is  required. 

Using  the  ideas  presented  here  it  is  easy  to  also  develop  a 
list  decoding  algorithm  for  the  5-error-correcting  code.  I  will 
discuss  this  possibility. 

II.  Outline  of  the  decoding  algorithm 

The  6-error-correcting  Calderbank-McGuire  code  C  is  a 
submodule  of  Z32.  The  code  is  defined  by  BCH-like  parity 
checks  involving  the  elements  of  the  Teichmuller  set  inside 
the  Galois  ring  G72(4S).  If  /3  is  a  generator  of  the  non-zero 
Teichmuller  elements,  the  code  C  is  defined  by  the  following 


GI2(45)-valued  parity  checks 

/111 

1  •• 

'  1  \ 

H  = 

0  1  P 

P2 

•  P30  ] 

0  1  p3 

P6  •• 

■  P90  ‘ 

\  0  1  ps 

£10  •• 

.  £180  J 

We  remark  that  a  parity  check  for 

the  5-error-correcting 

Calderbank-McGuire  code  is  obtained  from  the  above  matrix 
H  simply  by  multiplying  the  last  row  by  2. 

We  express  words  x  of  C  2-adically,  i.e.  x  =  u(x)  4-  2v(x), 
where  u  and  v  are  binary  vectors  of  length  32.  Using  if  it  is 
easy  to  see  (cf.  [3])  that  here  u  must  be  a  word  of  the  Reed- 
Muller  code  R{ 2,5)  and  that  each  u  G  12(2, 5)  determines  a 
coset  /( u)  + 12(2, 5)  with  the  property  that  u  +  2v  £  C,  if  and 
only  if  v  €  /( u)  4-  J2(2, 5).  One  of  the  reasons,  why  C  has 
such  nice  distance  properties  is  that  /( u)  + 12(2, 5)  is  actually 
in  J2(3, 5)/i2(2, 5),  whenever  u  is  a  vector  of  minimum  weight 
8. 

We  can  similarly  write  any  error  vector  e  in  the  form 
e  =  u(e)  4-  2v(e)  with  binary  u  and  v.  Here  simple  key 
observations  are  that  in  order  to  get  an  error  vector  e  of  Lee 
weight  at  most  6,  the  Hamming  weight  of  u  must  not  ex¬ 
ceed  6.  Furthermore,  if  the  Hamming  weight  of  u  is  5  or 
6,  then  the  support  of  v  must  be  contained  in  the  support 
of  u.  Another  useful  observation  is  that,  if  we  also  con¬ 
sider  — e  =  u(e)  4-  2(v(e)  +  u(e)),  we  see  that  either  v(e) 


or  v(— e)  =  v(e)  +  u(e)  (mod  2)  has  Hamming  weight  at 
most  3. 

So  given  a  received  vector  y  =  u(y)  +  2v(y)  =  x  +  e,  x  €  C 
we  decode  it  as  follows.  First  reduce  the  Z-i-valued  compo¬ 
nents  modulo  2  and  then  decode  the  resulting  vector  u(y) 
with  a  full  decoding  algorithm  for  the  code  12(2, 5).  We  re¬ 
quire  such  a  decoding  algorithm  theat  gives  a  list  all  possible 
error  patterns  of  weight  at  most  6,  i.e.  all  the  words  of  weight 
at  most  6  that  lie  in  the  coset  u(y)  + 12(2, 5).  These  are  then 
the  candidates  for  the  vector  u(e).  We  then  process  the  list 
and  for  all  candidates  u  try  to  find  a  matching  v  taking  all 
the  above  observations  into  account. 

III.  Full  decoding  of  R{2,5) 

A  complete  decoding  algorithm  for  22(2, 5)*  has  been  given 
by  Seroussi  and  Lempel  [5].  It  is  based  on  an  earlier  binary 
matrix  factorization  algorithm  due  to  Lempel.  I  will  describe 
that  algorithm  as  an  application  of  the  theory  of  symmetric 
bilinear  forms  over  a  binary  vector  space.  It  is  quite  simple 
to  extend  their  algorithm  to  a  complete  decoding  algorithm 
for  72(2, 5).  The  covering  radius  of  22(2, 5)  is  six.  So  after  this 
stage  we  get  a  single  candidate  u,  namely  a  coset  leader  of 

U(y)  +  -R(2,5)- 

The  weight  distribution  of  all  the  cosets  of  22(2,  5)  has  been 
determined  by  Berlekamp  and  Welch  [1].  From  their  data  one 
sees  that  any  coset  has  at  most  8  words  of  weight  at  most 
5  and  at  most  35  words  of  weight  6.  Altogether  our  list  of 
candidates  u  may  contain  up  to  36  elements.  Luckily  simple 
applications  of  affine  geometry  allow  us  to  find  all  the  low 
weight  words  in  a  coset,  when  a  leader  is  known.  A  relatively 
efficient  way  of  achieving  this  is  to  precompute  look-up  tables 
of  low  weight  words  for  certain  standard  coset  leaders  (one  for 
each  orbit  of  the  group  of  affine  transformations)  and  modify 
the  Lempel-Seroussi  algorithm  to  always  reduce  into  one  of 
the  standard  cases.  Thus  we  can  meet  all  the  requirements  of 
the  main  algorithm  and  hence  succesfully  decode  all  the  error 
patterns  of  Lee  weight  at  most  6. 
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Abstract  —  We  present  an  algebraic  decoding  al¬ 
gorithm  for  all  Z4-linear  Goethals-like  codes  Ck  intro¬ 
duced  by  Helleseth  et  al.  We  show  how  Dickson  poly¬ 
nomials  can  be  used  to  solve  syndrome  equations. 


I.  Introduction 

Let  m  be  an  odd  integer  and  let  Z4  denote  the  ring  of 
integers  modulo  4.  Let  also  R  =  GR(4,  m)  be  a  Galois  ring 
of  characteristic  4  with  4m  elements.  The  group  of  units  in  R 
contains  a  unique  cyclic  subgroup  T  =  {0,1,  p, ...,  P2  ~2}  of 
order  2m  —  1.  Every  element  of  R  can  be  expressed  uniquely 
in  the  form  A  +  2 B  where  A,B  G  T.  We  have  the  natural 
modulo  2  reduction  map  p  :  R  — >  F  where  F  is  a  finite  field 
of  order  2m.  The  Gray  map  <p  :  Z2  — >  F22  +  ,  defined  by 
(f>{0)  =  00,  <^>(1)  =  01,  (p(2)  —  11  and  <j>( 3)  =  10,  maps  a 
Z  4-codeword  componentwisely  to  a  binary  codeword. 

Helleseth,  Kumar  and  Shanbhag  [1]  observed  that  the  Z4- 
linear  codes  Ck  with  parity-check  matrices 


Hk 


111  1 

0  1  /3  p2 

.0  2  2(32k+1  2p{2k+l)2 


1 

p2m~2 

2ig(2fc+i)(2m-2) 


have  the  same  weight  distributions  whenever  gcd (k,m)  =  1. 
They  have  22  -3m-  cocjeworcjs  anfj  minimum  Lee  distance 

8.  The  Gray  images  of  these  codes  <f>(Ck)  are  nonlinear  binary 
codes  which  have  the  same  Hamming  weight  distribution  as 
the  Goethals  code.  Helleseth  and  Kumar  [2]  presented  a  com¬ 
plete  decoding  algorithm  for  the  code  C\.  In  this  talk  we 
sketch  an  algebraic  decoding  algorithm  for  all  codes  Ck ,  which 
corrects  errors  with  Lee  weight  <  3. 


Theorem  1.  Let  S  =  (1,  A  +  2B,  2C)  denote  the  syndrome  of 
a  co set. 

(i)  If  b  —  0  and  c  =  a2  +1,  then  the  coset  leader  has  Lee 
weight  1  and  is  uniquely  determined  by  x  =  a  and  ex  — 
1. 

(ii)  If  b  ^  0  and  c  =  a2  +1,  then  the  coset  leader  has  Lee 
weight  3  and  is  uniquely  determined  by  x  =  a+b,  ex  =  2, 
y  =  a,  and  ey  =  3. 

(in)  Ifb  ^  0,  c  yt  a2  +1  and  Tr(-|t,--^  )  =  0,  then  the  coset 
leader  has  Lee  weight  3.  The  coset  leader  is  uniquely 
determined  by  ex  —  ey  =  1,  ez  =  3,  D2k_1(z  +  a,b2)  = 

a  ~*~c  and  x  and  y  are  the  zeros  of  T2  +  (z  +  a)T  + 
b2  +  az  =  0.  In  particular  in  the  case  k  =  2  the  variable 
z  should  satisfy  ( z  +  a)3'+  b2(z  +  a)  =  a°  j~c  . 

(tv)  Ifb  ±  0,  c  ±  a2k+1  andp(T)  =  T3+aT2  +  (a2+b2)T+a3 
has  three  distinct  zeros  in  F  where  <j3  satisfies  Dn  (a 3  + 

a3 +  ab2,b6)  =  a2k+d1+c  and 

jn—  2  +1  and  d  =  1  if  2  {  k 
~  and  d  —  b 2  if  2\k, 

then  a  coset  leader  has  Lee  weight  3  and  is  uniquely 
determined  such  that  x,y,z  are  the  three  distinct  ze¬ 
ros  of  p(T)  in  F  and  ex  —  ey  =  ez  =  3.  Espe¬ 
cially,  when  k  —  2  the  condition  for  <73  can  be  stated 
as  =  c+a*+aH>  +  ab* 

(v)  If  none  of  (i)-(iv)  hold,  then  any  coset  leader  has  Lee 
weight  >  5. 


II.  Dickson  polynomials 

In  the  decoding  procedure  we  solve  the  roots  of  an  equation 
Dn(x,u)  =  v,  where  gcd(n,  2m  —  1)  =  1  and 


IfJ 

Dn(x,u)  = 


/  \i  n  —  2i 
(~U)  X 


is  a  Dickson  polynomial.  It  satisfies  the  functional  equation 
D„(x  +  y,  xy)  —  xn  +  yn,  which  implies  that  the  roots  can  be 
solved  effectively  by  Cardan’s  method. 

For  further  details  see  for  example  the  survey  [3]. 


III.  Main  results 

Let  X,  Y,Z,A,B,C  denote  elements  in  T  and  x,  y,  z,  a,  b ,  c 
their  images  in  F  under  ^-mapping.  The  syndrome  of  the 
error  vector  e  €  Z2  is  S  =  eHk  =  ( t,A  +  2B,2C),  where 
t  €  Z4.  The  decoding  algorithm  in  [2]  can  be  straightforwardly 
generalized  in  the  cases  t  =  0  and  t  =  2  but  in  the  case  t  =  ±1 
we  need  the  Dickson  polynomials. 
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Abstract  — 

We  give  a  new  algorithm  for  the  solution  of  the 
Hamming  metric  decoding  problem  for  alternant 
codes  over  a  Galois  ring  R.  First  we  develop  a  com¬ 
prehensive  theory  of  Grobner  bases  over  R\x\ , . . .  ,xn], 
which  is  of  independent  interest.  By  specialising  to 
the  case  of  one  variable,  we  show  that  the  solution 
of  the  key  equation  can  be  determined  as  a  certain 
minimal  element  in  a  Grobner  basis  of  the  solution 
module. 


I.  Introduction 


In  [IPE97]  a  modified  Berlekamp-Massey  algorithm  was  pre¬ 
sented  as  part  of  a  (Hamming  metric)  decoding  procedure 
for  BCH  and  RS  codes  defined  over  Galois  rings.  The  prob¬ 
lem  of  constructing  and  decoding  alternant  codes  over  Galois 
rings  was  addressed  in  [AIP98]  by  adapting  the  techniques  of 
[IPE97],  In  this  paper  we  give  a  new  algorithm  for  decoding 
alternant  codes  over  Galois  rings. 

Let  P  =  GR(pn ,r\)  be  a  Galois  ring  of  characteristic  pn 
defined  by  a  basic  irreducible  polynomial  of  degree  n  over 
Zpn,  and  let  R  =  GR{pn ,r2)  be  the  Galois  extension  of  P, 
where  r\  |  ri,  defined  by  a  basic  irreducible  polynomial  of 
degree  ri/r\  over  P.  Let  R'  be  the  group  of  units  of  R,  and 
let  G  =  (C)  be  the  unique  cyclic  subgroup  of  R‘  of  order  pr2  —  l, 
whose  elements  are  the  roots  of  xv  2  —  1.  An  alternant  code 

C(N,  r,  q,7,  P)  of  length  N  with  symbols  from  P  is  defined 
as  follows.  Let  a  =  (qi,Q2,...,Q n)  be  a  vector  of  distinct 
elements  of  R,  with  the  condition  that  cq  —  ay  be  a  unit  for 
all  i  ^  j,  and  let  7  =  (71,72, . . .  ,7 n)  be  a  vector  with  non- zero 
components  7 ,■  G  R‘ .  The  alternant  code  C  =  C(N,r,a,~f,  P) 
is  the  P-submodule  of  PN  defined  by  the  parity  check  matrix 


H  = 


/  7i 
7iQi 

\  71  «r’ 


72 

72  Q2 


72  e*2 


7 n  \ 

7non 

lNOtqNl  J 


A  straightforward  modification  of  the  BCH  bound  establishes 
that  C  has  minimum  Hamming  distance  greater  than  q.  Er¬ 
ror  polynomials,  the  syndrome  polynomial  S,  the  error  locator 
polynomial  E,  and  the  error  evaluator  polynomial  fi  all  take 
the  same  form  as  their  counterparts  over  a  field  and  satisfy 
the  key  equation  E S  =  Q  mod  xq .  The  decoding  problem  is 
equivalent  to  solving  this  congruence  subject  to  certain  con¬ 
ditions. 

In  [F95]  new  algorithms  corresponding  to  the  Euclidean, 
Berlekamp-Massey,  and  Peterson-Gorenstein-Zierler  algo¬ 
rithms  for  the  solution  of  the  key  equation  were  derived  using 
Grobner  bases.  Each  of  these  algorithms  is  computationally 
at  least  as  efficient  as  its  classical  analogue  [F95,  FJ98].  This 


approach  has  been  extended  to  rational  approximation  and 
interpolation  problems,  and  to  the  solution  of  multivariable 
congruences  (F96,  F97],  In  this  paper  we  apply  similar  prin¬ 
ciples  to  decoding  alternant,  codes  defined  over  a  Galois  ring. 

II.  Grobner  bases  in  A [37 , . . .  ,xv\ 

We  generalise  the  theory  of  Grobner  bases  to  the  specific  con¬ 
text  of  a  Galois  ring  R.  Many  of  our  results  are  exact  ana¬ 
logues  of  those  holding  over  a  field.  However,  their  proofs  are 
complicated  by  the  change  in  significance  of  the  coefficients, 
which  may  be  zero  divisors  in  R.  We  establish  a  division 
algorithm  and  the  existence  of  Grobner  bases  and  give  a  gen¬ 
eralisation  of  Buchberger’s  algorithm  in  which,  at  each  stage, 
a  set  of  (appropriately  defined)  5-polynomials  to  be  included 
in  the  new  basis  is  augmented  by  certain  p-power  multiples  of 
the  elements  of  the  current  basis. 

III.  Grobner  bases  in  R[x]2  and  decoding 

The  general  structure  of  a  Grobner  basis  of  a  submodule  of 
R\x\7  is  given  by 

Theorem  1  Let  A  be  an  R— submodule  of  R\x}2  Then  A  has 
a  Grobner  basis  of  the  form 

{(tto,  bo), ,  (an  - 1  ,bn-  1),  (co,  do), . . . ,  (cn- 1  ,d„- 1)} 

satisfying,  for  alli,j  £  {0 , . . . ,  n  —  1 } 

i.  lm(ai,6i)  =  ( p'x9ai,0 ),  lm(cqdi)  -  (0 ,pjx8d}) 

ii.  da i  <  daj  for  i  >  j,  dd,  <  ddj  for  i  >  j. 

Define  the  solution  module  M  —  {(a,  b)  :  aS  =  b  mod  xq}. 
It  is  easy  to  see  that  {(1,  5),  (0,  xq)}  is  a  basis  of  M .  Our  main 
result  is  that  the  required  solution  may  be  found  by  convert¬ 
ing  this  to  a  Grobner  basis. 

Theorem  2  The  solution  (E,  £2)  of  the  key  equation  required 
for  decoding  the  alternant  code  C(N,  r,  a,  7,  P)  is  (up  to  equiv¬ 
alence)  the  minimal  regular  element  of  the.  Grobner  basis  of  the 
solution  module  M  under  the  term  order  <~\. 
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Abstract  —  A  new  algorithm  that  can  calculate  the 
multiplicative  inverses  in  GF(2m)  with  CX]og2  m)  iterations 
is  presented.  While  this  algorithm  requires  in  total  the  same 
number  of  multiplications  ( [log2(m  -  1 )]  +  Hw(m  -  1 )  -  1  ) 
with  the  best  known  algorithm  [1],  the  latency,  if  mapped  to 
a  hardware,  can  be  reduced  significantly  ( [log2(/n  -2)]  +  1 ), 
comparable  to  the  best  case  result,  which  is  implemented 
using  Fermat's  little  theorem. 

I.  Traditional  Algorithms 

One  of  the  famous  inversion  algorithms  is  to  calculate  a 
formula  x_1  =  x2”~2  =  x2'x22---x2”  '  ,  following  Fermat's 
little  theorem  (Figure  la).  If  this  formula  is  mapped  into  a 
sequential  circuit,  the  latency  is  m-2  multiplications.  If 
mapped  to  a  combinational  circuit,  the  latency  can  be  improved 
to  [log2(m  -  2)]  +1  by  arranging  multiplications  like  a  tree. 

In  [1],  Itoh  and  Tsujii  proposed  an  improved  algorithm  that 
requires  the  least  number  of  multiplications  ever  known  (Figure 
lb).  The  latency  is  at  most  twice  as  long  as  that  of  Fermat's 
theorem;  [log2(rw  —  1 )]  +  Hw(m  - 1 )  -  1  multiplications  where 
Hw{)  denotes  Humming  weight.  It  is  difficult  to  shorten  the 
latency,  since  the  multiplications  need  to  be  performed  in  a 
sequential  manner. 

II.  The  Proposed  Algorithm 

Figure  2  shows  our  new  algorithm  and  Figure  3  shows  some 
example  computation  sequences  by  using  our  algorithm.  After 
the  ([log2(m  - 1)]  +  l)-th  iteration  of  for-loop,  a  multiplicative 
inverse  is  obtained  in  a  register  yi .  Please  note  that  the  first 
multiplication  to  y2  and  the  last  multiplication  to  yi  are  always 
unnecessary,  although  this  is  not  described  in  Figure  2,  for 
simplicity. 

Clearly,  our  algorithm  requires  in  total  the  same  number  of 
multiplications  with  Itoh  and  Tsujii's  algorithm.  The  latency, 
however,  is  only  [log2(w -2)]  +  1  multiplications,  since  in  the 
for-loop,  calculation  of  the  values  of  y\  and  yi  can  be 
performed  in  parallel,  if  mapped  to  a  hardware.  This  algorithm 
can  be  used  in  any  value  of  m  and  any  basis  representations, 
and  also  can  be  implemented  not  only  as  a  sequential  circuit  but 
also  as  a  combinational  circuit.  We  believe  that  combining  our 
algorithm  with  a  composite-field  based  method  in  [1,2]  will 
make  a  very  fast  and  compact  inversion  circuit  possible,  if  m  is 
not  a  prime  number. 

Appendix.  Correctness  Proof  of  Our  Algorithm 

Suppose  that  m  -  1  is  represented  by 

m  —  \  =  E,=0  2 '  where  L  >  ts- 1  >  t o  (1) 

From  (1), 

V£(s2:  k^\);(m-  l)mod2,;  2‘\  (2) 

and  also 

(m-  1)  mod  2'"  =  0  (3) 

From  Figure  2,  the  output  of  our  algorithm  x0„t  is 

Xou,  =Hj,=0X*  *{(2<2‘*'*>-  1)  •  2«(«,-1)m°d(2**/A))+1)} 

From  (2), (3)  and  (4), 

Xoui  -  x*  *(L/,=0(2"*  -  2"*'1 )) 

where  ui,  -  (£;Lo  2‘‘)  +  1 .  (5) 


Figure  1.  Computation  sequences  of  multiplicative  inverse  (m  =  16)- 
(a)  Fermat's  little  theorem,  (b)  Itoh  and  Tsujii's  algorithm. 


y i  :=*: 
yi  ;= 1 ; 

for  k  =  0  to  [log2(m  -  1)]  do  begin 
if  (bit-A  of  (w-1))  =  1  then 

y2  :=  y2  x  (yi  *  *2(((m~1)mod(2*'*,)+1)); 

end  if; 

yt  :=y i  x  (yi  *  *2(2***>) ;  /*  x  *  *(2<2**<A'+I»  -  1)  wj||  be  stored  */ 
end  for; 
write  y2 ; 

Figure  2.  The  proposed  iterative  algorithm. 


From  (1)  and  (5), 

xom  =x**(2“'-2')  =  x*  *(2"‘  - 2)  =  x-‘  □ 
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Abstract  —  The  Turbo  code  interleaver  design  prob¬ 
lem  is  considered  for  relatively  large  block  sizes,  where 
the  effect  of  trellis  termination  is  less  marked.  An 
optimised  interleaver  design  technique  based  on  sim¬ 
ulated  annealing  is  proposed  -  performance  is  signif¬ 
icantly  better  than  the  Berrou-Glavieux  interleaver 
without  an  increase  in  delay. 

I.  Introduction 

The  classical  use  of  interleavers  is  to  randomise  the  location 
of  errors,  enabling  the  use  of  random-error-correcting  codes 
on  channels  with  burst  error  patterns.  Turbo  coding  also  in¬ 
troduces  a  further  dimension  to  interleaver  requirements,  due 
to  the  effects  of  the  iterative  algorithm.  Most  optimised  in¬ 
terleaver  design  techniques  in  the  literature  are  based  on  the 
JPL’s  S-random  interleaver  algorithm  [1].  While  S-random 
interleavers  perform  well,  the  technique  was  not  intended  as  a 
basis  for  advanced  interleaver  design.  Its  main  shortcomings 
are  that  it  is  not  guaranteed  to  produce  the  required  inter¬ 
leaver  and  that  it  only  aims  at  achieving  a  spread  S. 

II.  Optimised  Interleaver  Design 
Simulated  annealing  [2]  can  be  used  to  design  optimised  inter¬ 
leavers  by  definining  an  energy  function  based  on  a  predefined 
set  of  requirements.  We  use  a  random  interleaver  as  an  ini¬ 
tial  state,  and  define  the  perturbation  function  as  a  swap  of 
two  random  interleaver  entries,  ensuring  that  the  interleaver 
is  always  valid.  The  energy  function  used  is: 

u  =  y  , . . 5  _  (i) 

tT  V'(i-J')a  +  [A(0-A0-)]2 

where  i,j  6  [0,  r  —  1],  r  is  the  block  size,  v  is  the  encoder 
memory,  and  A()  is  the  interleaving  function.  This  energy 
function  attempts  to  ‘push’  bit-pairs  away  from  the  origin  in 
the  Input-Output  Distance  Spectrum  (IODS)1,  increasing  the 
spread  of  the  interleaver.  In  contrast  with  the  JPL  technique 
it  does  not  guarantee  a  particular  spread;  however,  it  pushes 
points  away  from  the  origin  even  beyond  the  spread  boundary. 

III.  Results 

We  restrict  ourselves  to  unpunctured  rate- 1  symmetric  Turbo 
codes  with  v  =  2  and  generator2  (1,5/7).  In  order  to  avoid 
the  effects  of  trellis  termination,  we  also  choose  r  =  1024.  As 
a  reference  for  performance,  we  implement  a  uniform  inter¬ 
leaver  by  using  a  different  random  interleaver  for  every  block 
simulated  [3].  We  compare  our  interleaver  design  with  this 
uniform  interleaver,  a  rectangular  interleaver,  the  design  used 
by  Berrou  and  Glavieux  [4],  and  an  S-random  interleaver  in 
Fig.  1.  Our  design  achieves  a  BER  of  10~5  at  ^  ~  1.35  dB. 

*A  two-dimensional  histogram,  with  the  axes  being  the  distance 
between  bit-pairs  at  the  input  and  output  of  the  interleaver  [3]. 

2  Polynomials  are  denoted  as  ga  or  ga/s6>  where  ga  is  the  feed¬ 
forward  and  (?;,  is  the  feedback  polynomial,  in  octal. 
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Fig.  1:  Turbo  code  BER  simulation  (10  iterations) 


IV.  Conclusions 

Our  new  interleaver  design  performs  at  least  as  well  as  the 
S-random  interleaver.  However,  using  our  technique  it  is  eas¬ 
ier  to  include  design  restrictions,  for  example  to  make  the 
interleaver  correctly-terminating  or  odd-even.  Also,  more  so¬ 
phisticated  energy  functions  matched  to  the  component  codes 
may  be  considered,  particularly  for  use  with  punctured  codes. 
Utilising  some  performance  enhancement  techniques,  the  com¬ 
plexity  of  the  energy  function  grows  only  as  O(r),  making  it 
suitable  for  use  with  large  block  sizes. 
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Abstract  —  We  propose  a  new  algorithm  for  Turbo 
code  interleaver  design,  which  is  based  on  the  con¬ 
ventional  s— random  approach  and  whose  complexity 
grows  only  linearly  with  the  interleaver  length. 

Designing  the  interleaver  it  =  (7Ti ; ..;  ttk)  of  length  K  of  a 
Turbo  code  serves  to  increase  the  code’s  minimum  distance 
(5m in  and  hence  to  lower  the  error  floor  of  the  Word  and  Bit 
Error  Rates  (WER/BER).  An  efficient  method  was  presented 
in  [1],  Examinations  show  that  for  so-designed  interleavers, 
the  codeword  at  <5min  is  mainly  caused  by  a  combination  of 
an  input  word  u®  of  the  first  component  encoder  (identical 
to  the  Turbo  encoder  input  u)  and  a  second  component  in¬ 
put  word  u®  as  shown  in  Fig.  1.  In  this  example,  “1001” 
represents  an  error  pattern,  i.e.  an  input  sequence  causing  a 
short  error  event  in  a  component  code  trellis.  The  s-random 
interleaver  7r  does  not  avoid  that  the  four  “l”s  in  the  two  error 
patterns  of  u®  are  mapped  crosswise  to  two  error  patterns 
in  u®,  since  the  two  “l”s  belonging  to  each  error  pattern 
in  u®  are  spread  to  distant  positions  in  u®,  and  hence  the 
spreading  condition  of  [1]  is  satisfied.  However,  this  unlucky 
mapping  of  positions  can  be  avoided  and  Sm\n  can  be  increased 
by  modifying  the  interleaver  design  algorithm. 

Position 

u®  =  u 

7T 

U® 

Position 

Figure  1:  Unlucky  mapping  of  positions 

The  proposed  algorithm  incorporates  the  s-random 
method  of  [1]  and  hence,  it  successively  determines  7ri  to  ttk  ■ 
In  step  l,  we  set  up  the  set  Ai  C  {1; K}  of  possible  values 
for  7 r(,  which  have  not  already  been  assigned  to  nt  in  earlier 
steps  t  <  l,  and  which  satisfy  the  spreading  condition  of  [1], 
Moreover,  step  l  consists  in  determining  and  discarding  values 
of  Ai,  which  would  cause  an  unlucky  mapping  like  in  Fig.  1. 

Determining  these  unfavourable  values  in  Ai  can  be  done 
very  efficiently  using  a  recursive  backtracking  approach,  which 
is  shortly  outlined  using  the  example  of  Fig.  1.  Our  basic  ob¬ 
servation  is  that  any  “1”  present  in  u®  =  (it®; ..;  it®)  or 
u®  =  (u® ; it®),  respectively,  must  belong  to  an  error 
pattern.  Otherwise  the  associated  codeword  has  large  weight 
and  can  be  ignored,  since  we  consider  and  try  to  avoid  only 
low  weight  codewords.  In  step  l,  we  consider  exclusively  u® 
with  it®  =  1  and  u®  =0,  Vt  >  l.  Our  starting  point  for 
the  backtracking  is  that  the  “1”  in  it®  must  belong  to  an  er¬ 
ror  pattern  (as  reasoned  above).  Every  possible  error  pattern 
must  be  considered,  and  for  each  of  them,  we  must  proceed  in 

*M.  Breiling  is  sponsored  by  the  Fraunhofer  Gesellschaft  -  In- 
stitut  fur  Integrierte  Schaltungen,  Erlangen. 
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m  m+3  1-3  l 
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a  backtracking  manner.  In  our  example  of  Fig.  1,  we  consider 
only  the  error  pattern  “1001”  in  u®3  to  it®.  Since  7ri_3  =  j 
has  already  been  determined,  we  know  that  it®  =  1.  Follow¬ 
ing  the  above  reasoning,  the  “1”  in  it®  must  belong  to  an 
error  pattern,  for  which  we  must  consider  every  possibility.  In 
the  Fig.,  we  consider  “1001”  in  it' _3  to  it)  .  For  the  case  that 
j  —  3  has  earlier  been  assigned  to  7rm,  m  <  l,  we  conclude 
that  it®  =  1.  Every  possible  new  error  pattern  containing 
it®  =  1  must  be  considered  in  u®  (in  the  Fig.  “1001”  in 
u®  to  tt^3).  Finally,  for  7tm+3  =  i,  we  find  that  it®  =  1. 
We  must  thus  discard  i  —  3  from  Ai,  since  this  prevents  the 
assignment  7r;  =  i  —  3,  which  would  otherwise  complete  the 
unlucky  mapping  in  Fig.  1.  When  all  unfavourable  values  have 
been  discarded  from  Ai,  then  7 q  is  randomly  chosen  from  the 
remaining  values.  The  backtracking  algorithm  works  also  for 
error  patterns  of  weight  >  2.  The  complexity  of  a  complete 
interleaver  design  grows  linearly  with  K. 

We  verified  the  proposed  algorithm  by  designing  an  in¬ 
terleaver  of  length  K  =  200  for  a  Turbo  code  of  rate  1/2 
employing  M  =  2  component  codes  (generator  polynomials: 
(1;  5/7)).  In  the  design,  we  used  s  =  8  and  considered  all  error 
patterns  of  weight  <  3.  For  a  termination  of  both  component 
trellises,  this  Turbo  code  attains  <5m|n  =  14.  Fig.  2  shows 
the  WER  (upper  curves)  and  BER  (lower  curves)  for  varying 
Eb/No  (received  energy  per  information  bit  over  the  one-sided 
noise  power  spectral  density)  for  a  simulated  transmission  us¬ 
ing  coded  BPSK  over  an  AWGN  channel.  The  performance 
is  compared  to  using  a  pure  s-random  interleaver  [1]  with 
s  =  10  (expected  3min  <  12)  and  a  uniform  interleaver  [2] 
(mean  Jmin  <  6)  of  the  same  length.  We  can  clearly  see  the 
improved  BER  and  particularly  WER  for  higher  Eb/No. 
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Abstract  —  In  this  paper  the  design  of  interleavers 
for  Turbo  Codes  is  considered.  The  proposed  algo¬ 
rithm  is  based  on  a  Hamming  weight  cost  matrix.  It 
optimizes  both  the  minimal  distance  of  Turbo  Codes 
and  the  passing  of  extrinsic  information.  Simulation 
results  show  that  for  short  lengths  these  interleavers 
improve  the  error  performances  at  high  SNR. 


I.  Introduction 

It  is  admitted  that  the  interleaver  is  the  key  element  of 
Turbo  Codes  [1]  [2],  In  order  to  optimize  the  distance  spec¬ 
trum  and  the  minimal  distance  of  Turbo  Codes,  the  inter¬ 
leaver  should  map  input  sequences  u(D )  which  generate  low 
weight  output  sequences  yi{D)  with  interleaved  sequences 
v(D)  which  generate  high  weight  output  sequences  yRD),  and 
vice  versa.  Due  to  the  iterative  structure  of  the  turbo  decoder, 
the  interleaver  should  also  guarantee  a  good  passing  of  extrin¬ 
sic  information  from  one  decoder  to  the  other.  The  proposed 
interleaver  optimizes  both  these  two  criteria.  In  order  to  in¬ 
crease  the  minimal  distance,  a  Hamming  weight  cost  matrix 
is  used  for  the  construction.  The  second  goal  is  achieved  since 
the  proposed  interleaver  belongs  to  the  family  of  cycle  op¬ 
timized  interleavers  [3].  The  interleaver  is  built  element  by 
element  using  a  tree  search  method. 

Let  u  =  [uo,  ui, . . . ,  ujv-i]  and  v  =  [«o,  vi ,  ■  •  ■ ,  vn-i]  re¬ 
spectively  be  the  input  and  output  sequences  of  the  inter¬ 
leaver.  We  have  the  relation  :  v  =  ui  where  I  =  {aij}vxtv 
with  aij  €  {0,1}  and  —  12,  ah  =  1-  We  can 

also  define  the  interleaver  with  the  permutation  vector  E  = 
[e(0),  e(l),  e(2), . . . ,  e(N  —  1)]  where  e(i)  =  j  <=>  a q  =  1 


II.  Interleaver  Design 


For  the  construction  of  E ,  we  will  use  a  cost  matrix  J 
of  same  dimension  as  I.  J  =  {bij}xxN  bij  is  equal  to  the 
Hamming  weight  of  the  lowest  Hamming  weight  code  gener¬ 
ated  from  the  input  sequences  u(D)  with  Hamming  weight 
w  <  wmax  and  supposing  aq  —  1.  Each  new  element  e(n)  is 
chosen  according  to  both  criteria  defined  previously. 

1.  initialization  : 

bij(0)  —  +oo  Vi,j 
e(0)  is  chosen  randomly 

2.  for  (n  =  1, 2, . . . ,  N  —  1)  : 

-update  of  bi}(n)  ( i  >  n)  Vj: 


bij(n)  =  min 


bij(n  -  1),  min  \  w  +  ^  ylk  +  ^  y2k 


with  C  =  {u{D)  =  DIq  +D1'  +D'2  +  ■■■  +  Dlu’~'} 


N  —l 

w  =  ^  Uk  <  wmax  and  w  <  n  +  1 
k= o 

with  lo  <  h  <  <  lv >-3  <  n  —  1,  (1) 

I  u» — 2  — -  r  i  1,  I  in — i  2  n  1  (2) 

-choice  of  e{n)  : 

let  81  =  {j  |  bnj  >  Dm  ax}  (3) 

£2  =  {j  |  \i-j\  +  |e(i)  -  e( j ) |  >  L, 

i  =  n  —  1,  n  —  2, . . . ,  n  —  L  <-  2}  (4) 


e(n)  €  £  =  £  1  fl  £2.  e(n)  is  chosen  randomly  in  £. 
Equations  (1)  and  (2)  reduce  the  set  C  of  input  sequences 
v(D)  to  test  for  each  n.  Equation  (3)  corresponds  to  the  mini¬ 
mal  distance  constraint.  Equation  (4)  corresponds  to  the  cycle 
optimized  constraint  which  imposes  that  two  bits  separated  by 
A'  bits  (A  <  L  —  2)  in  the  input  sequence  u(D)  should  be  sep¬ 
arated  by  at  least  L-2-X  bits  in  the  sequence  v(D).  £  is  the 
set  of  all  the  new  positions  satisfying  both  constraints.  This 
method  allows  us  to  build  an  interleaver  with  minimal  dis¬ 
tance  Dm  ax  and  minimum  cycle  L.  From  [3],  it  is  possible  to 
build  an  interleaver  with  L  <  \fN  +  2.  If  the  tree  search  fails 
(£  —  0),  the  procedure  should  be  started  again.  To  obtain  an 
interleaver  with  the  greatest  minimal  distance,  the  procedure 
must  be  repeated  by  increasing  the  value  Dm  ax  until  it  is  no 
longer  possible  to  build  the  interleaver. 

III.  Results  and  Conclusion 

Simulations  using  a  11=1/3  Turbo  Codes  with  two  8  states 
RSC’s  with  generator  (15/17)s  were  performed.  For  A^= 105 
bits,  the  parameters  obtained  with  wmax—  3  are  Dmax  —  W 
and  L=10.  Simulation  results  show  that,  at  high  SNR.,  the 
performances  of  Turbo  Codes  using  this  interleaver  are  O.ldB 
better  than  the  best  interleavers  available  in  the  literature  [4], 
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Abstract  —  A  new  interleaver  design  to  improve  the 
performance  of  the  Turbo  codes  is  presented  here. 
Two  criteria  are  considered  in  the  design  of  the  inter¬ 
leaver;  the  distance  spectrum  properties  of  the  code 
and  the  correlation  between  the  input  information 
data  and  the  soft  output  of  each  decoder  correspond¬ 
ing  to  its  parity  bits.  A  deterministic  interleaver  de¬ 
sign  based  on  these  criteria  is  also  proposed  here. 

I.  Introduction 

Turbo  codes[l]  have  an  impressive  near  Shannon  limit  error 
correcting  performance.  This  superior  performance  of  Turbo 
codes  compared  to  convolutional  codes  is  only  achievable  when 
the  length  of  the  interleaver  is  very  large,  on  the  order  of 
several  thousand  bits.  For  large  block  size  interleavers,  most 
random  interleavers  perform  well. 

An  interleaver  7r  is  a  permutation  i  i— ►  ir(i)  that  maps  a 
data  sequence  of  N  input  symbols  into  the  same  sequence  in  a 
new  order.  An  S-random  [2]  interleaver  is  a  semi-random  in¬ 
terleaver  that  performs  better  than  most  random  interleavers. 
Each  randomly  selected  integer  is  compared  to  S  previously 
selected  random  integers.  If  the  distance  between  this  inte¬ 
ger  and  previously  selected  random  integers  is  greater  than  S, 
then  it  is  selected.  Otherwise,  a  new  random  integer  will  be 
chosen  and  this  process  is  repeated  until  all  N  distinct  inte¬ 
gers  are  selected  in  this  random  order.  This  interleaver  design 
assures  that  the  short  cycle  events  are  avoided.  Short  cycle 
event  occurs  when  two  bits  are  close  to  each  other  before  and 
after  interleaving. 

II.  2- STEP  S-RANDOM  INTERLEAVER  DESIGN 

A  new  interleaver  design,  2-step  S-random  interleaver,  is 
presented  here  based  on  the  S-random  interleaver.  The  2-step 
S-random  interleaver  is  designed  under  the  constraint  to  in¬ 
crease  the  minimum  effective  free  distance  of  the  Turbo  code 
without  increasing  the  correlation  properties  between  the  in¬ 
formation  input  data  sequence  and  the  soft  output  of  each 
decoder  corresponding  to  its  parity  bits.  The  criterion  used  in 
the  second  step  of  the  design  is  based  on  the  revised  version  of 
iterative  decoding  suitability  (IDS)  condition  that  is  described 
in  [3-4]. 

Step  1:  Each  randomly  selected  integer  7r(i)  is  compared 
with  the  previous  selections  7 r(j)  to  check  that  if  i  —  j  <  Si 
then  \n{i)  —  7r  (j  )  |  >  Si.  We  also  insist  that  rr  must  satisfy 
\i  —  rr(i)|  >  5b  ■  S\  and  S2  are  two  constants. 

Step  2:  Choose  the  maximum  pre-determined  weight  wdet  for 
input  data  sequences  and  the  minimum  permissible  effective 
free  distance  code  dminiWdet .  Find  all  input  data  sequences  of 
length  N  and  weight  wi  <  Wdet  and  their  corresponding  effec¬ 
tive  free  distance  dWl  for  the  Turbo  encoder  with  an  interleaver 
design  based  on  step  1  such  that  dW[  <  .  All  these 

input  data  sequences  are  divisible  before  and  after  interleaving 
by  the  feedback  polynomial  (usually  a  primitive  polynomial) 
of  the  Turbo  encoder.  Consider  the  first  input  data  block  of 


weight  w  1  with  non-zero  elements  in  locations  (ii ,  *2,  -  -  • ,  iwi ) 
and  d.w ^  ^  drnin^Wdet  .  Compute  IDS^new^  based  on  [4]  for  the 
original  interleaver  designed  in  step  1.  Set  j  —  i\  + 1  and  find 
the  pair  (j,  rr(jf)).  Interchange  the  interleaver  pairs  (A,  7r(ii)) 
and  (j,ir(j))  to  create  a  new  interleaver,  i.e.,  (A ,  7r(j))  and 
C?,7r(*i)).  Compute  the  new  IDS,  IDS[new),  based  on  the 
new  interleaver  design.  If  IDS[new)  <  IDS{new),  replace  the 
interleaver  by  the  new  one.  Otherwise,  set  j  =  j  +  1  and  con¬ 
tinue.  Repeat  this  operation  for  all  input  data  sequences  with 
a  minimum  weight  of  wi  <  wdet  and  dWl  <  dmin<Wdet .  After 
completing  this  operation,  return  to  step  2  and  find  all  in¬ 
put  data  sequences  of  weight  wt  <  wdet  with  dwi  <  dmin,wdet 
for  the  new  interleaver.  Continue  this  step  until  it  converges 
and  there  is  no  input  data  sequence  of  weight  wi  <  wdet  with 
dWl  <  dmin, wdt.t-  Obviously  if  dm.ii i,wdet  is  selected  a  large 
value,  the  second  step  may  never  converge,  and  in  this  case 
dmi„,Wdet  should  be  reduced. 

An  interleaver  design  proposed  in  [6]  is  based  on  the  joint  S- 
random  criteria  and  elimination  of  all  error  patterns  of  weight 
Wi.  However,  in  practice  the  joint  optimization  criteria  will 
not  converge  easily  and  therefore  the  value  of  S  must  be  re¬ 
duced  and  Wi  restricted  to  only  weight  two  inputs.  By  sepa¬ 
rating  these  two  criteria  into  two  steps,  we  can  easily  find  the 
appropriate  interleaver  satisfying  each  step  separately. 

In  some  applications  we  need  to  have  a  deterministic  inter¬ 
leaver  to  reduce  the  hardware  requirements  for  the  Turbo  en¬ 
coder  and  decoder.  The  following  theorem  describes  a  tech¬ 
nique  to  design  a  deterministic  interleaver  based  on  step  1. 

Theorem  1:  Let  a  6  N  be  a  natural  number  such  that 
gcd(a,  N)  =  1  and  a  —  1  divides  N.  Then  there  is  a  permuta¬ 
tion  IT  £  Sn  satisfying  (i)  and  (ii)  with  Si  :=  min(a,  ^L.)  and 
Sz  ■=  Let  (3  :=  and  define  7r  :  {1,  ...,N}  — > 

{1, . . . ,  N}  by  Tv(i)  :=  a-i+/3,  where  1 v(i)  has  to  be  interpreted 
as  the  number  n(i)  £  {1, . . . ,  N}  that  is  congruent  to  a  ■  i  +  /? 
modulo  N. 
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Abstract  —  We  show  that  an  optimal  source  code 
with  cost  function  for  code  symbols  can  be  regarded 
as  a  random  number  generator  generating  a  random 
sequence  (not  necessarily  a  sequence  of  fair  coin  bits) 
as  the  target  distribution  in  the  sense  that  the  normal¬ 
ized  conditional  divergence  between  the  distribution 
of  the  generated  codeword  distribution  and  the  tar¬ 
get  distribution  vanishes  as  the  block  length  tends  to 
infinity. 

I.  Introduction 

In  1998,  Visweswariah  et  al.  [1]  and  Han  [2]  have  indepen¬ 
dently  shown  that  an  optimal  variable-length  source  code  can 
be  regarded  as  a  variable-length  random  number  generator  in 
the  sense  that  the  normalized  divergence  distance  between  the 
distribution  of  the  generated  codeword  process  and  the  uni¬ 
form  distribution  vanishes  as  the  block  length  tends  to  infinity. 

On  the  other  hand,  as  is  well  known,  if  we  impose  un¬ 
equal  costs  on  code  symbols,  it  is  no  longer  optimal  to  use  the 
code  which  minimizes  the  average  codeword  length.  Karp  [3] 
has  given  an  algorithm  for  constructing  minimum-redundancy 
prefix  codes  with  unequal  cost  symbols.  Naturally,  there 
would  exist  a  bias  in  the  frequency  of  code  symbols  gener¬ 
ated  by  an  optimal  source  code  with  cost.  Can  we  then  con¬ 
sider  the  optimal  variable-length  source  code  with  cost  as  a 
variable-length  nonuniform  random  number  generator?  The 
purpose  of  this  study  is  to  demonstrate  that  the  answer  to 
this  question  is  “yes” . 

II.  Variable-length  Source  Coding  with  Cost 

Let  X  be  a  countably  infinite  source  alphabet  and  y  be  a 
finite  code  alphabet,  respectively.  In  the  sequel  all  the  log¬ 
arithms  are  taken  to  the  base  K  =  |3?|,  where  |^|  denotes 
the  cardinality  of  V.  We  denote  the  set  of  all  non-null  fi¬ 
nite  length  sequences  taken  from  by  y* .  Let  us  now  de¬ 

fine  a  general  source  as  an  infinite  sequence  X  =  {Xn  = 
•  •  •  Xln^)}^=1  of  n-dimensional  random  variables  X" 
where  each  component  random  variable  Xf"*  (1  <  t  <  n) 
takes  values  in  X.  The  class  of  sources  thus  defined  covers  a 
very  wide  range  of  source  including  all  nonstationary  and/or 
nonergodic  sources. 

Next,  we  define  the  cost  function  c  :  y*  — >  R+  =  (0,-Eoo] 
as  follows:  First,  each  symbol  y  G  y  is  assigned  the  corre¬ 
sponding  cost  c(y)  such  that  0  <  c(y)  <  -foo  (Vy  G  V),  and 
then  the  additive  cost  c(y)  of  y  =  (yi,  J/2,  • :  • ,  yk)  €  yk  is 
defined  by  c(y)  =  c(y()- 

Definition  1  :  R  is  called  an  achievable  variable-length 
source  coding  cost-rate  for  the  source  X  if  there  ex¬ 
ists  a  variable-length  prefix  encoder  <p„  :  Xn  -t  V' 

10.  Uchida  is  now  with  the  Dept,  of  Network  Engineering,  Kana- 
gawa  Institute  of  Technology,  Atsugi,  Kanagawa,  243-0292  Japan. 


given  the  cost  function  c  :  y*  — >  R+  such  that 
limsupn_too  £iJ{c(¥?n(Xn))}  <  R,  and  the  infimum  of  R 
that  are  achievable  variable-length  source  coding  cost-rates 
is  denoted  by  Rl(~X.),  which  we  call  the  infimum  achievable 
variable-length  source  coding  cost-rate. 

Theorem  1  :  For  any  general  source  X,  we  have 
Rl(X)  =  —  limsup  —H(Xn), 

&c  n  — foo  R 

where  the  cost  capacity  ac  is  the  positive  unique  root 
a  of  the  equation  K -ac(y)  =  1  and  H(Xn)  = 

-  Exe*"  Pxn  M  lo8  px*  (x). 

III.  Source  Code  with  Cost  as  A  Nonuniform 
Random  Number  Generator 

Given  a  variable-length  prefix  encoder  y>„  :  Xn  — >  V,  we 
define  T>m  =  {x  €  Xn  |  l(<p„(x))  =  m}  for  any  positive  inte¬ 
ger  m,  where  /(•)  denotes  the  length  of  a  string,  and  we  put 
=  (m|  Pr{X”  €  Vm }  >  0}.  For  any  m  G  we 

define  X£,  as  the  random  variable  taking  values  in  Dm  with 
the  distribution  given  by  Px”  (x)  =  Pr|x"epL}'  (x  ^ 

For  any  positive  integer  m,  U(m^  indicates  an  i.i.d.  sequence 
of  length  m.  Let  us  now  define  the  conditional  divergence  by 

D(¥,n(X")||V'/”>|/n)  =  ]T  Pr{/„  =  7n}D(¥>n(JC)||V<’">) 

m£j(ipn  ) 

where  /„  is  the  random  variable  such  that  /„  =  m  for  Xn  G 
Vm. 

Then,  we  have  the  following  main  theorem. 

Theorem  2  :  We  assume  that  the  entropy  rate  of  the  general 
source  X  has  the  limit  limn-n^,  i H(Xn ).  Let  ip„  :  Xn  — > 
be  any  optimal  variable-length  prefix  encoder  in  the  sense  that 

lim  i£{c(^„(Xn))}  =  K(X). 

n  — foo  71 

If  we  define  the  probability  distribution  qc  =  {qc(y)}y€y  cor¬ 
responding  to  the  cost  function  c  by  qc{y)  =  K~accl-y *  (y  G 
V),  then  we  have 

lim  -D{vn(Xn)\\V(I’')\In)  =0, 

n— foo  71 

where  V(m*  stands  for  the  i.i.d.  sequence  of  length  m  subject 
to  the  distribution  qc. 

References 

[1]  K.  Visweswariah,  S.  R.  Kulkami  and  S.  Verdu,  “Source  codes 
as  random  number  generators,”  IEEE  Trans.  Inform.  Theory , 
vol.  44,  pp. 462-471,  Mar.  1998. 

[2]  T.  S.  Han,  Information-Spectrum  Methods  in  Information  The¬ 
ory,  Baifukan-Press,  Tokyo,  1998  (In  Japanese). 

[3]  R.  M.  Karp,  “Minimum  redundancy  coding  for  the  discrete 
noiseless  channel,”  IRE  Trans.  Inform.  Theory,  vol.  IT-7, 
pp. 27-38,  Jan.  1961. 


454 


0-7803-5857-0/00/S10.00  ©2000  IEEE. 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Random  Number  Approximation  Problem  for  Discrete  Memoryless 

Sources 

Yasutada  Ooliama 
Graduate'  School  of  Information  Science 
and  Electrical  Engineering,  Kyushu  University 
6-10-1  Hakozaki,  Higashi-ku.  Fukuoka  812,  Japan 
e-mail:  oohamaficsce  .kyushu-u.ac  .  jp 


Abstract  We  consider  the  simulation  problem  of 
generating  random  sequences  from  an  arbitrary  pre¬ 
scribed  discrete  memoryless  source  (DMS)  by  using 
a  random  sequence  from  a  given  DMS.  We  propose 
two  simple  algorithms  and  give  some  explicit  results 
for  their  asymptotic  performances. 

I.  Introduction 

Simulation  problems  of  generating  random  sequences  from 
a  prescribed  information  source  by  using  a  random  sequence 
from  a  given  information  source  is  called  random  number 
problem.  Recently,  simple  and  efficient  algorithms  for  ran¬ 
dom  number  problem  and  the  analysis  of  their  performances 
were  studied  by  Han  and  Hoshi  [1],  Uyematsu  and  Kanaya 
[2]  and  Oohama  [3].  We  deal  with  the  simulation  of  generat¬ 
ing  random  sequences  of  fixed  length  from  an  arbitrary  pre¬ 
scribed  discrete  meinoryloss  source  (DMS)  by  using  a  random 
sequence  of  fixed  length  from  a  given  DMS.  We  propose  two 
simple  algorithms  and  derive  explicit  results  for  their  asymp¬ 
totic  performances.  Our  results  contain  some  of  the  results  of 
Uyematsu  and  Kanaya  [2]  and  Oohama  [3]  as  special  cases. 

II.  Random  Number  Approximation  Problem 

Let  A’  and  Y  be  random  variables  taking  values  in  finite 
sets  X  and  y,  respectively.  The  distributions  of  X  and  Y 
are  denoted  by  Px  =  {Px(-?)}rGX  alld  A '  =  {-Pv('/)}!,g>'- 
respectively.  Let  V(X)  and  V(y)  denote  the  set  of  all  prob¬ 
ability  distributions  on  X  and  respectively.  Consider  two 
stationary  discrete  memoryless  sources  {A/}£Lj  and  {Y/Jj’Ij. 
For  each  t  =  1,  2,  ■  •  A’<  and  Yi  obeys  the  same  distribution  as 

those  of  X  and  Y.  respectively.  We  write  random  sequences  of 
lengths  n  and  m  from  information  sources  as  A"  =  A"i  Aj  •  ■  • 
X„  and  Ym  =  Y1Y2  •  •  •  l’m,  respectively. 

The  Fixed  to  Fixed  random  number  approximation  prob¬ 
lem  discussed  here  is  as  follows.  Let  ~p„  :  A"’  — *  X" .  Let 
<F(r)  denote  the  set  of  all  the  map  <p„  that  satisfies  the  rate 
constraint  m  <  nr.  By  the  map  the  random  sequence 
Ym  is  transformed  into  the  sequence  <p„(Ym).  which  is  used 
as  an  approximation  of  the  random  sequence  A” .  We  con¬ 
sider  the  approximation  error  measured  by  the  variational  dis¬ 
tance  between  the  distributions  of  p„(Ym)  and  A'"  denoted 
by  d(~pn(Ym),  A”). 

Next,  we  explain  our  proposed  algorithms  for  approxima¬ 
tion.  Let  ix  be  an  one-to-one  map  from  A’"  to  {1.  2,  ■  •  ■ ,  |A’|"  }, 
where  |.4|  denotes  the  cardinality  of  the  set  A.  Let  Px(:r"). 
.rn  €  X "  denote  the  probability  of  x"  and  Ix  ( .r”  )  be  a  subin¬ 
terval  of  [0, 1)  given  by  IX(:c"  )  -  [La  (  c"  ),  Lx(.rn )  + Px(r“  )), 
where  Lx(xn)  =  ]£„»„•(„«)«(*->)  px(a"  )■  Definitions  and  no¬ 
tations  for  Y  are  the  same  as  those  for  A'.  In  the  proposed 
algorithms  the  map  <pn  has  the  following  form.  For  if"  €  ym 
define  <pn(ym)  —  xn  if  Ly  (ym  )  6  Ix  (x" ).  In  the  arithmetic 


algorithm  the  map  ix  is  determined  according  to  some  lexico¬ 
graphical  order  of  sequences  in  A”1.  In  the  sorting  algorithm 
the  map  ix  is  determined  according  to  the  descending  order 
of  values  of  probabilities  of  sequences  in  X" .  The  definition 
of  ly  for  Y  is  the  same  as  that  for  A. 

To  state  our  results  for  the  performances  of  the  above  two 
algorithms,  set 

F\(R.  Px )  =  min  {  [A(R  -  H{P)  -  D(P\\PX))]+ 

P£V(,X)  K 

+D(P\\Px)} 

F+(R,Px)=  lim  Fx(R.Px), 

A — foe 

F-  ( R,  Px )  =  lim  Fx(R.Px). 

A — »  — -x 

where  [t]+  =  max{0,f}.  Let  R-(PX)  =  minie,r(  —log 
Px(  r)).  R+(P\ )  =  maxr€A’(  -log  Px(-r))  and  set 

ns=  {(R.R):  R  >  rR-(I\') , 

rF.(^.Py)  <R<R+(Px)}. 

Define  two  functions  by 


E4r.Px.Py) 


=  min  max /fi(  R.  PX  ).  rF-  (  — ,  Py')  \ 
)  L  \r  J I 

E4r.Px.Py)  =  miu  {  [7?  -  /?]  + 


+  max  {f+(R.  Px  ),  rF-  (£.  Pv)  }  j . 


Our  main  results  are  as  follows. 


Theorem  1  For  any  r  >  0  and  the  sequence  of  maps 
defined  by  the  arithmetic  algorithm 

lim  (--)  logd(Vp„(Ym),A’')  >  E4r.Px.Py).  0) 
ii  — oo  \  n  J 

Theorem  2  For  any  r  >  0  and  the  sequence  of  maps 
{ip„  :  y >„  €  $„(r)}*=1  defined  by  the  sorting  algorithm 

liiii^  (-i)  logd(f4Ym).Xv)  >  E4r.Px.Py).  (2) 
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Abstract  —  Source  conversion  by  means  of  two  bi¬ 
nary  prefix  codes  is  considered.  A  source  sequence  is 
encoded  into  a  stream  of  codewords  by  the  first  code. 
Then,  the  stream  is  parsed  into  codewords  of  the  sec¬ 
ond,  which  consequently  produces  another  source  se¬ 
quence.  The  conversion  rate  is  investigated  in  the 
context  of  the  combined  sourc e-(d,k)  coding. 

I.  Introduction 

In  the  combined  sourc e-(d,k)  coding  scheme  as  introduced 
by  Kerpez  [1],  an  arithmetic  code  performs  source  coding  and 
(d,k)- constrained  channel  coding  simultaneously.  While  its 
mechanism  looks  very  similar  to  the  interval  algorithm  for 
generating  random  numbers  proposed  by  Han  and  Hoshi  [2], 
Kerpez’s  code  essentially  consists  of  an  arithmetic  encoder 
for  an  information  source  and  an  arithmetic  decoder  for  the 
maxentropic  source  associated  with  the  (d,  k)  constraints  [3]. 
Using  these  two  codes,  Kerpez’s  code  converts  a  given  source 
to  another,  say  to  the  maxentropic  source. 

To  study  on  source  conversion  more  generally,  we  investi¬ 
gate  the  coding  performance  of  the  combination  of  two  binary 
prefix  codes  0i  and  0 2:  the  first  code  0  1  maps  an  informa¬ 
tion  source  sequence  X  =  X1X2  ■  ■  ■  over  X  =  {qi 
into  the  intermediate  binary  process  Y  =  0i(AT)0i(AT2)  ■  ■  ■ 
over  y  =  {0,1}  while  the  second  parses  Y  into  Y  = 
02(-Zi)02(-Z2)  ■  ■  and  outputs  Z  =  Z1Z2  ■  . .  over  Z  = 
{711  •  ■  •  iIr}- 

The  conversion  rate  p  is  the  ratio  of  the  length  of  Z  to  that 
of  X .  More  precisely,  we  define 

p{X,cf> i,<j>2)  =  limsup  N(<f>i  (Xe),(f>2)/£ 

l  — foo 

where  tj>  1  (Xe )  =  0i (Xi )  •  •  •  0i (X()  and  N(yk  ,  0 2)  is  the  num¬ 
ber  of  codewords  completely  parsed  by  <f> 2  for  yk  g  yk . 

In  this  article,  we  obtain  a  formula  on  conversion  rate  for  an 
independent  and  identically  distributed  ( i.i.d .)  source  X  and 
apply  it  to  the  problem  of  the  combined  source  and  (d,  k)- 
constrained  channel  coding. 

II.  A  Formula  on  Conversion  Rates 

Fixed  Y  =  y  g  T°°,  we  expect  that  for  a  sufficiently  long 
subsequence  ym  (m  >>  1),  the  ratio  N(ym,  4>2)/N(yrn ,  cj>\) 
is  close  to  the  conversion  rate.  In  fact,  if  X  is  i.i.d.,  then 
{N(Ym m  >  0}  with  N(Y°,<j>  1)  =  0  is  a  renewal 
process  [4]  with  mean  £7|0i  (JV)|,  i.e.,  the  average  codeword 
length  of  0i  for  X.  Hence,  the  strong  law  for  renewal 
processes  says  limm_>oo  N(  Ym,  0i)/m  “='  l/E\<f>i(X)\  where 
a='  means  the  convergence  with  probability  one.  However, 
{N(Ym  ,<f> 2),  m  >  0}  is  not  a  renewal  process  since  the  pro¬ 
cess  {02(Zi),  02(2/2),  ••  •}  is  not  always  i.i.d.  except  the  case 

the  probability  of  X  is  D-adic,  that  is,  each  of  pk  =  Pr{X,-=ajt} 
equals  D~e  for  some  l. 

To  overcome  this  difficulty,  we  get  the  insight  into  a  Markov 
chain  which  exists  behind  Y .  Here,  for  i  —  1,2,  let  T,  be 


a  binary  tree  whose  paths  from  the  root  to  external  nodes 
uniquely  correspond  to  codewords  of  0,  by  labelling  0  and  1 
to  the  left  and  right  branches  of  internal  nodes,  respectively. 
The  set  of  states  of  the  chain  consists  of  internal  nodes  of  T\ . 
And  the  state  transition  probability  P  can  be  recursively  given 
by  initially  assigning  probabilities  pk  {k  =  1,...,D)  to  the 
corresponding  external  nodes.  Then,  Y  is  given  as  the  output 
process  of  the  chain  which  emits  the  label  0  or  1  according  to 
a  branch  passed  through  in  every  transition. 

Now,  construct  a  joint  Markov  process  with  the  state  set 
consisting  of  pairs  of  internal  nodes  of  7j  and  T2.  Its  tran¬ 
sition  probability  is  automatically  deduced  from  P.  Then, 
Y  can  be  thought  as  the  output  process  of  the  joint  Markov 
chain  as  well.  Having  considered  its  stationary  probability,  we 
showed  that  lim^oo  N{Ym,T2)/m  °=  1/£|02(Z)|  where  Z 
is  a  random  variable  with  the  stationary  probability  of  Z . 
Theorem  1  Given  an  i.i.d.  source  X,  and  prefix  codes  0  1 
and  0 2, 

p(X,<h,<h)  =■ E\MX)\/E\MZ)\. 


III.  Rate  of  Combined  Source -(d,k)  Coding 

Let  Cd,k  be  the  set  of  binary  strings  0  •  •  •  01  consisting  of  i 
consecutive  zeros  followed  by  a  symbol  1  for  i  =  d,  d- 1-1, . . . ,  k. 
Let  f(d,k)i  or  simply  /,  be  a  one-to-one  mapping  from  Z  to 
Cd.k  where  \Z\  =  k  —  d  +  1.  After  converting  the  source 
sequence  X  to  Z  through  4>  1  and  <j) 2,  we  apply  /  to  each 
symbol  in  Z.  Then  we  obtain  a  (d,k)- constrained  sequence 

f{Z)  =  f(Z\)f(Z2)---.  Let  us  define  the  conversion  rate 
p(X ,  0i,  02,  /)  of  the  combined  coding  by 

p(X,0i,02,/)  =  limsupi  V'  |/(t)I  N(<t>i{Xe),4>2,l) 

where  N(ym,<p2,'t )  is  the  frequency  of  02(7)  parsed  from  ym . 
Theorem  2  p(X,01,02,/)  °=  p(X ,  01;  02)  E\f(Z)\. 


Finally,  we  obtain  the  following  theorem. 
Theorem  3  There  exists  a  series  of  pairs  of 
and  0 Z1  — >  T*  such  that 

lim  p(X,4>^ 


ym 


(0  t  \  <Lf-  H(X) 
,  I (d,k)  )  —  - - 


2  log  A 

where  T*  is  the  set  of  all  finite  sequences  over  y,  H(X)  is 
the  entropy  rate  of  X,  and  A  is  the  largest  positive  root  of 
E*=d+i  =  L 
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Abstract  —  This  paper  deals  with  the  interval  algo¬ 
rithm  proposed  by  Han  and  Hoshi  for  random  number 
generation,  and  evaluates  the  efficiency  of  the  algo¬ 
rithm  for  each  sample  path  instead  of  evaluating  over¬ 
all  expectation.  We  show  a  theorem  in  the  almost- 
sure  sense  to  give  bounds  on  the  sup  generating  rate 
as  well  as  on  the  inf  generating  rate  for  each  sample 
of  input  and  output  processes. 

I.  Introduction 

This  paper  deals  with  the  most  general  random  number 
generation  problem  by  interval  algorithm  [1]  where  the  pro¬ 
cess  of  repeated  coin  tosses  and  that  of  repeated  random  num¬ 
ber  generations  are  general  processes  subject  to  neither  sta- 
tionarity  nor  ergodicity  but  consistency  restrictions.  We  are 
concerned  with  the  case  in  which  the  target  process  should  be 
generated  exactly  subject  to  the  prescribed  probability  mea¬ 
sure,  and  concentrate  on  the  almost  sure  asymptotic  property 
of  the  generating  rate  of  each  sample,  i.e.  the  number  of  coin 
tosses  per  output  sample  of  the  general  process.  To  this  end, 
we  introduce  the  minimum  length  function  to  indicates  the 
length  of  the  shortest  prefix  of  sample  x  £  A°°  from  the  gen¬ 
eral  source  with  which  the  interval  algorithm  generates  the 
n-length  prefix  of  some  sample  y  £  A°°  subject  to  the  target 
probability  measure.  Then  we  define  sup  generating  rate  and 
inf  generating  rate  of  each  input  sample.  As  a  result,  we  prove 
a  theorem  in  the  almost-sure  sense  to  give  bounds  on  the  sup 
generating  rate  as  well  as  on  the  inf  generating  rate  for  each 
sample  of  input  and  output  processes. 

II.  Basic  Definitions 

(a)  General  sources 

Let  A  be  a  finite  set  and  (A“,  T)  a  measurable  space,  where 
A°°  is  the  set  of  all  strings  of  infinite  length  that  is  formed 
from  the  symbols  in  A ,  and  J-  is  a  cr- field  of  subsets  of  A°° . 
Let  p  be  a  probability  measure  defined  on  (A“ ,  T).  Then 
we  call  (A00,?,^)  a  probability  space.  We  call  p  a  general 
process  [2].  Throughout  this  article,  we  assume  for  p  neither 
stationarity  nor  ergodicity  but  consistency  restrictions. 

An  extension  of  the  interval  algorithm  for  general  sources 
was  indicated  in  [1,  Remark  12].  So,  we  omit  the  description 
of  the  algorithm. 

(b)  Inf  generating  rate  and  sup  generating  rate 

The  minimum  length  function  Uf  :  A°°  — >  N  is  defined  as 
the  length  of  the  shortest  prefix  of  sample  x  £  A°°  from  the 
general  source  u  with  which  the  interval  algorithm  generates 
the  n-length  prefix  of  some  sample  y  £  A°°  subject  to  the 
target  probability  p.  Here  it  should  be  understood  that  £?(*) 
is  defined  as  +oo  if  the  above  set  is  empty.  We  call  L"(x)  the 


minimum  length  of  x.  Further,  we  define  the  sup  generating 
rate  for  any  source  sample  x  as 

li(x)  =  limsup  —  Lj(x)  Vx  £  A°°. 

n— f  oo  ^ 

Similarly,  the  inf  generating  rate  is  defined  as 

lj(x)  —  liminf  —  Lf(x)  Vx£A°°. 

n— koo  Yl 

III.  Main  results 

We  require  the  following  hypotheses  to  prove  the  theorem 
as  well  as  the  consistency  restrictions  for  p  and  v  : 

HI:  There  exists  a  positive  number  a  such  that 

liminf  —  log  — r  >  a  r'-a.s. 
n-t-x  n  u(xn ) 

H2:  There  exists  a  positive  number  /3  such  that 

liminf  —  log  -  ,  >  3  p- a.s. 

n-+oo  n  6  p(xn) 

Suppose  that  for  the  input  sample  x  £  A°°,  the  output 
sample  y  £  A°°  is  generated  by  the  interval  algorithm.  Then, 
the  following  theorem  holds. 

Theorem  : 

M*)  hv{x) 

^McUxXhM  as 

hj?)  hv(x) 

where  h„(x)  and  hv(x)  (resp.  h^y)  and  h^y))  are  inf  v- 
complexity  rate  and  sup  v -  complexity  rate  (resp.  inf  and  sup 
,u- complexity  rates)  defined  in  [2].  Especially,  if  both  processes 
v  and  p  are  stationary  ergodic,  then 

ll(x)  =  h{x)  =  ^  a.s. 

flj/ 

where  h„  (resp.  h^)  denotes  the  entropy  rate  of  the  process  v 
(resp.  p). 

It  should  be  noted  this  theorem  is  an  extension  of  the  re¬ 
sults  in  [3]  where  we  only  deal  with  i.i.d.  processes. 
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Abstract  —  Upper  bounds  on  the  reliability  func¬ 
tion  of  the  Gaussian  channel  were  derived  by  Shan¬ 
non  in  1959  [1].  Kabatiansky  and  Levenshtein 

[2]  obtained  a  low-rate  improvement  of  Shannon’s 
“minimum-distance  bound”.  Together  with  the 
straight-line  bound  this  provided  an  improvement 
upon  the  sphere-packing  bound  in  a  certain  range  of 
code  rate. 

In  this  work  we  prove  a  bound  better  than  the 
KL  bound  on  the  reliability  function.  Employing 
the  straight-line  bound,  we  obtain  a  further  improve¬ 
ment  of  Shannon’s  results.  As  intermediate  results 
we  prove  lower  bounds  on  the  distance  distribution 
of  spherical  codes  and  a  tight  bound  on  the  exponent 
of  Jacobi  polynomials  of  growing  degree  in  the  entire 
orthogonality  segment. 

Let  Sn_1  be  a  sphere  of  radius  <j\J An  in  Rn,  where  A  is 
the  signal-to-noise  ratio  in  the  channel  and  cr2  is  the  variance 
of  the  Gaussian  noise  along  each  coordinate.  A  code  W  is 
a  finite  subset  of  Sn~l .  The  number  R  =  (1/n)  In  |W|  is 
called  the  rate  of  W .  Let  d(W)  :=  minx.yew  dist(x,y)(0  ^ 

d{W)  ^  las/ An),  be  the  minimum  distance  of  W.  Suppose 
that  W  is  used  for  transmission  over  the  Gaussian  channel. 
Let  Pe{W)  =  Sxgw  R=(x)  be  the  error  probability  of  W 
under  maximum  likelihood  decoding.  Let 

Pe(R,A,n)  -  min  Pc(W) 

W:R{W)^:R 

E(R,  A,  n)  =  -ilog Pc(R,A,n). 
n 

Shannon  [1]  introduced  the  function 

E(R,A)  =  lim  E{R,A,n)  and  called  it  the  reliability 

n— ►  oo 

function  of  the  channel.  Computing  this  function  forms  a 
central  problem  of  information  theory.  In  the  same  paper 
Shannon  proved  the  sphere-packing  upper  bound  on  E{R,  A) 
in  the  form 

E(I,A)  ^  A  _  VAMA)cos9_  ,nWA)„„8), 

^  (1) 

(heres(0,A)  =  |(\/Acos0-fv/A"cos5lT+~4),  9  =  arcsin(e~*)), 
and  the  minimum-distance  bound 

E(R,  A)  <  Emd(R,  A)  =  (A/8)d2(R), 

where  d(R)  is  any  upper  bound  on  the  distance  of  a  code 
of  rate  R.  Kabatyansky  and  Levenshtein  [2]  proved  a  new 
upper  bound  on  the  distance  of  spherical  codes  in  the  form 
d(R)  ^  i5 (p(R)),  where  5(x)  =  \/2(\/l  +  x  —  yfx)/\/ 1  -t-  2x, 
p(R)  is  the  root  of  the  equation  R  —  (1  +  p)H(  and  H  is 


the  natural  entropy  function.  Using  this  bound  in  Emd  (R,A) 
together  with  the  straight-line  bound,  they  improved  [2]  upon 
Shannon’s  results  in  a  certain  range  of  code  rates. 

The  main  result  of  the  present  paper  is  the  following  theo¬ 
rem. 


Theorem  1  The  reliability  function  of  the  Gaussian  channel 
with  signal-to-noise  ratio  A  satisfies  the  upper  bound 


d2 

E(R,A)£  min  max  min(A  —  ,A 

0<p<p(R)  w,d  v  8 


where  R  is  a  value  of  the  code  rate, 


and 


0  <  d  S(p(R)),  d  sg  w  <  8(p), 

.  .  (  Ad2w2  1  2 

A(w)=nun{H4w2_d2),F(l--w  ,p)j, 

F(x,p)^R-(l+p)H(T^~) 


+  ln(-(x  +  V(1  +  2 p)2x2  -  4p(l  -  p))) 


+  (1  +  2 p)  In 


(1  -I-  2 p)x  +  y/(\  -I-  2 p)2x2  -  4p(l  -  p) 

2(1  +P) 


Bound  (2)  is  better  than  the  minimum-distance  bound  on 
E(R,A)  of  [2].  Together  with  the  straight-line  bound  related 
to  it,  (2)  also  improves  upon  the  sphere-packing  exponent  (1) 
in  a  larger  range  of  code  rates  than  the  results  in  [2]. 

The  proof  consists  of  the  following  4  steps: 

-a  general  theorem  on  the  distance  distribution  of  spherical 
codes.  This  theorem  carries  over  to  the  spherical  case  the 
techniques  developed  recently  in  [3],  [4], 

-asymptotic  bounds  on  the  exponent  of  Jacobi  polynomials 
Pk  (this  is  an  extension  of  a  result  in  [2]  on  the  asymptotics 
of  the  extremal  zero  of  Pk), 

-asymptotic  bounds  on  the  distance  distribution  of  spherical 
codes,  and 

-bounding  below  the  error  probability  of  decoding  for  a  code 
with  a  known  distance  distribution. 
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Abstract  —  In  this  paper  we  develop  AWGN  cod¬ 
ing  theorems  for  ensembles  of  codes  for  which  we  can 
calculate,  or  at  least  closely  estimate,  the  ensemble 
weight  enumerator.  As  a  rule,  for  such  an  ensem¬ 
ble  we  can  find  a  threhold  c  such  that  if  Eb/No  >  c, 
then  the  ensemble  maximum-likelihood  error  proba¬ 
bility  appraoches  zero.  This  threshold  is  always  bet¬ 
ter,  and  usually  much  better,  than  can  be  obtained 
from  the  union  bound.  The  role  of  low-weight  code¬ 
words  is  key. 


where  6  =  h/n. 

For  technical  reasons,  we  need  to  make  the  following  two 
assumptions  about  the  behavior  of  for  h  =  o(n).  Both 

assumptions  say,  roughly,  that  there  are  not  too  many  words 
of  low  weight  in  the  ensemble. 

•  Assumption  1:  There  exist  a  sequence  of  integers  dn  such 
that  dn  — >  oo  and 

lim  'V''  =  0. 

n — too  * 

h- 1 


1.  Introduction 

Coding  theory  has  been  revolutionized  by  the  discovery  of 
that  certain  random  ensembles  of  codes  (”  turbo”  style  codes, 
LDPC  codes,  and  their  relatives)  can  be  effectively  decoded 
with  iterative  message-passing  algorithms.  Of  course  a  ran¬ 
dom  ensemble  is  a  candidate  for  iterative  decoding  only  if  it 
has  the  potential  for  good  performance,  as  measured  by  its 
maximum-likelihood  decoding  performance.  In  this  paper  we 
will  develop  a  technique  for  finding  the  ML  potential  for  a 
broad  class  of  random  ensembles,  on  the  AWGN  channel.  A 
weaker,  but  more  broadly  applicable,  technique  is  the  subject 
of  a  companion  paper  [2].) 


II.  Ensemble  Weight  enumerators 


By  an  ensemble  of  linear  codes  we  mean  a  sequence 
Cni ,  C„2 , ...  of  sets  of  linear  codes  of  a  common  rate  R,  where 
Cni  is  a  set  of  (nj,fc»)  codes  with  kr/rii  =  R.  We  assume  that 
the  sequence  ni,v,2  . . .  approaches  infinity.  If  C  is  an  ( n,k ) 
code  in  the  ensemble,  we  denote  the  weight  enumerator  of 
C  by  the  list  Ao(C),  Ai(C), . . . ,  An{C).  The  average  weight 
enumerator  for  the  set  Cn  is  defined  as  the  list 


A{ol\c),A{;‘\c),...,A(nn\c), 


where 


4n>  =  for  h  =  0, 1, . . . , 


ICnl 


cecT , 


Also,  we  define  the  ensemble  spectral  shape  : 

r(S)  =  lira  —  logAif^i  for  0  <  5  <  1, 

n-*oo  n  L  J 


assuming  that  the  limit  exists.  In  this  case,  we  may  write 

-j(n)  _  n(r(.5)+®(l)) 

Ah  —  e  i 

*The  work  of  these  authors  was  supported  the  National  Aero¬ 
nautics  and  Space  Administration. 

2The  work  of  these  authors  was  supported  by  NSF  grant  no. 
CCR-9804793,  and  grants  form  Sony  and  Qualcomm. 


•  Assumption  2: 

lim  <  0. 

It  is  our  goal  to  prove  an  AWGN  coding  theorem  for  such 
an  ensemble,  i.e.,  a  theorem  that  says  that  if  Eb/No  exceeds 
a  certain  tlireshold,  then  — *  0  as  n  — *  oo,  where  Fg  * 

denotes  the  ensemble  average  probability  of  (word)  error  for 
a  maximum-likelihood  decoder.  In  the  next  section,  we  will 
give  a  formula  for  such  a  threshold  based  on  a  recent  result  of 
Divsalar. 


III.  Definition  of  the  Divsalar  Threshold 

For  an  ensemble  of  the  type  discussed  in  Section  II,  we  can 
define  the  Divsalar  threshold  as  follows: 


cu  =  sup 
0<4<1 


1  -  <5  1  - 

6  2 


The  derivation  of  this  threshold  is  explained  in  [1],  (The 
threshold  corresponding  to  the  union  bound  is  cu  = 

suPo<<5<1  —  0 

IV.  The  Main  Theorem. 

Theorem.  For  an  ensemble  of  rate  R  which  satisfies  the  two 
assumptions  cited  in  Section  II,  if  Eb/No  >  (1  /R)cd,  then 

lim  P(ei)  =  0. 


Using  the  result  of  this  theorem,  and  the  known  expressions 
for  r{6)  for  the  ensembles  of  low-density  parity-check  codes 
and  “repeat-accumlulate”  codes  [3],  we  can  obtain  very  good 
values  for  the  ML  thresholds  for  these  codes. 
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Abstract  —  Decoders  employing  the  generalized 
likelihood  ratio  test  can  achieve  rates  that  can  be 
achieved  by  maximum  likelihood  decoders  on  ISI 
channels  even  though  they  are  ignorant  of  the  channel 
characteristics. 

I.  Introduction 

A  variety  of  communication  systems  can  be  modeled  accu¬ 
rately  by  an  intersymbol  interference  (ISI)  channel.  In  many 
situations,  however,  the  exact  nature  of  the  interference  may 
not  be  known  at  the  time  of  the  system  design.  In  this  note,  we 
consider  the  performance  of  a  particular  decoding  rule  that, 
in  contrast  to  maximum  likelihood  (ML),  operates  with  im¬ 
precise  knowledge  of  the  channel. 

The  question  of  the  existence  of  universal  decoders  for  the 
ISI  channel  has  been  previously  addressed  [1],  It  is  known  that 
universal  decoders  do  exist  for  this  class  of  channels.  However, 
the  existence  proof  suggests  a  very  complicated  construction, 
one  that  requires  to  consider  all  ML  decoders  for  all  the  pos¬ 
sible  ISI  channels,  and  to  form  a  “merging”  of  these  decoders 
into  a  single  universal  one.  As  such,  the  complexity  of  the 
evaluation  of  any  particular  codeword  is  very  high. 

For  the  case  of  discrete  memoryless  channels  the  situation 
is  simpler.  The  maximum  mutual  information  decoder  first 
suggested  in  [2]  and  widely  popularized  in  [3]  employs  a  rela¬ 
tively  simple  decoding  rule:  given  a  received  sequence  y,  and 
a  candidate  codeword  x,  compute  a  score  maxQ  Q(yjx),  where 
the  maximization  is  taken  over  all  DMC  probability  laws.  The 
decoder  then  chooses  the  codeword  with  the  highest  score.  It 
is  known  that  this  decoding  rule  is  universal.  Even  though 
the  cost  of  codeword  evaluation  is  more  than  that  of  max¬ 
imum  likelihood  decoding,  it  is  still  much  less  than  that  of 
universal  decoders  based  on  merging. 

The  natural  generalization  of  the  above  decoding  rule  leads 
to  the  so  called  “Generalized  Likelihood  Ratio  Test”  (GLRT): 
Let  the  possible  channels  be  parametrized  by  9  with  Pg  denot¬ 
ing  the  probability  law  of  the  corresponding  channel.  Given  a 
received  sequence  y,  compute  the  score  of  x  as  max«  (y|x), 
and  choose  the  codeword  with  the  highest  score. 

That  universal  decoders  do  exist  for  the  ISI  channel  does 
not  imply  that  the  GLRT  decoder  performs  well;  there  are 
classes  of  channels  for  which  there  exists  a  universal  decoder, 
but  GLRT  performs  poorly  [4].  In  this  presentation,  we  will 
investigate  the  performance  of  the  GLRT  on  ISI  channels.  In 
particular  we  will  show  that  as  far  as  achievable  rates  are  con¬ 
cerned,  the  GLRT  decoder  performs  as  well  as  the  maximum 
likelihood  decoder. 

II.  Results 

If  the  spectral  characteristics  of  an  ISI  channel  are  known  in 
advance,  the  codebook  used  over  this  channel  will  be  designed 
accordingly;  in  particular,  the  capacity  of  the  channel  can  be 
achieved  via  water  pouring.  Since  we  assume  that  the  ISI  co¬ 
efficients  are  not  known  in  advance,  we  will  consider  the  case 


Emre  Telatar 
EPFL  -  DSC  -  LTHI 
CH-1015  Lausanne 
Switzerland 

Emre . TelatarOepf 1 . ch 

in  which  the  codewords  are  chosen  to  have  a  flat  spectrum. 
We  will  content  ourselves  by  considering  the  rates  achievable 
by  GLRT  decoders  and  ML  decoders  when  the  codebook  is 
chosen  as  such.  Since  the  codebook  is  not  spectrally  matched 
to  the  channel  we  have  no  hope  of  achieving  the  true  capac¬ 
ity  of  the  channel;  but  we  feel  that  to  have  assumed  that 
the  transmitter  is  designed  with  the  knowledge  of  the  channel 
whereas  the  receiver  is  not  would  have  been  artificial.  In  all 
the  cases  considered  below,  the  transmitter  is  subject  to  an 
average  power  constraint  P. 

We  will  assume  that  the  channel  filter  a  has  at  most  a 
given  duration  J ,  and  that  the  output  of  the  channel  at  time 
k  is  related  to  the  channel  input  x  via 

Yk  =  {a*x)k  +  Zk,  k  =  l,...,n, 

where  *  denotes  cyclic  convolution  and  Zk  are  i.i.d,  circu¬ 
larly  symmetric  Gaussian  random  variables  with  E[Z{\  —  0, 
£[|Zi|2]  =  1.  The  use  of  cyclic  convolutions  is  motivated 
for  reasons  of  analytical  convenience,  but  can  be  justified  by 
prepending  each  codeword  by  its  last  J  symbols. 

In  addition  to  assuming  that  ak  =  0  for  k  >  J,  we  will 
further  assume  that  the  filter  a  satisfies  a  norm  constraint 

2<h 

The  GLRT  decoder  then  works  as  follows:  given  a  received 
y,  it  assigns  to  each  codeword  x  a  score  mina  ||y  —  a*x||  where 
the  minimum  is  taken  over  all  filters  a  of  at  most  J  taps  that 
satisfy  the  energy  constraint  ||a||2  <  H.  The  decoder  then 
declares  the  codeword  of  smallest  score. 

We  show  that  for  randomly  chosen  codes  (with  indepen¬ 
dently  chosen  codewords,  each  codeword  chosen  either  uni¬ 
formly  on  the  sphere  or  with  i.i.d.  Gaussian  components),  the 
error  probability  for  the  GLRT  decoder  decays  to  zero  as  long 
as  the  code  rate  is  less  than 

['  \og(l  +  P\a(6)\2)  dB, 

Jo 

where  a(9)  =  ^2k  ake'2n6k  is  the  Fourier  transform  of  the 
channel  impulse  response.  We  thus  see  that  for  ISI  channels, 
GLRT  decoders  can  achieve  all  rates  the  ML  decoder  can. 
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Abstract  —  In  this  work,  we  present  a  new  lower 
bound  on  the  feedback  capacity  of  the  colored  Gaus¬ 
sian  noise  channel.  Under  the  assumption  of  large 
power,  this  lower  bound  is  shown  to  be  strictly  larger 
than  the  non-feedback  capacity.  Insight  into  the  role 
of  the  feedback  and  the  capacity-achieving  strategy 
have  been  obtained. 

I.  Introduction 

In  [1],  Cover  and  Pombra  have  characterized  the  feedback 
capacity  of  the  discrete-time,  real,  colored  Gaussian  noise 
channel  with  noiseless  feedback,  but  their  expression  still  in¬ 
volves  an  unsolved  maximization  problem.  Since  then,  most 
of  the  work  has  concentrated  on  finding  upper  bounds  on  the 
capacity.  The  only  lower  bound  published  in  the  literature 
says  that  the  feedback  capacity  cannot  be  smaller  than  the 
non-feedback  capacity.  To  our  knowledge,  a  specific  feedback 
strategy,  applicable  for  any  noise  process,  that  shows  that  the 
feedback  capacity  is  strictly  larger  (when  this  is  actually  the 
case)  than  the  non-feedback  capacity  has  not  been  published 
in  the  literature  prior  to  our  work. 


where  the  maximization  is  taken  over  all  feedback  strategies 
of  the  form:  X  =  U  +  F  Z  where  U  is  a  Gaussian  vector  and 
is  independent  of  the  noise  process  Z  and  F  is  a  strictly  lower 
triangular  matrix,  since  the  feedback  has  to  be  strictly  causal. 


IV.  New  Lower  Bound  on  Feedback  Capacity 

Determing  the  feedback  capacity  reduces  to  a  joint  opti¬ 
mization  problem  over  Ku  and  F,  which  is  not  easily  solved 
in  closed  form.  However  by  fixing  a  strictly  lower  triangu¬ 
lar  matrix  F  and  finding  the  optimal  Ku  for  that  given  F, 
we  will  determine  a  lower  bound  on  the  feedback  capacity, 
parametrized  by  F.  Under  the  assumption  of  large  enough 
power  P,  it  is  obtained  that: 


Cn,fb(P)  >  ^  log 


f  (P+±tr(Kz)  +  Xtr(FKz))N} 

1  \Kz\  j 

(3) 


By  chosing  a  particular  feedback  matrix  F,  it  is  shown  that 
the  feedback  capacity  is  strictly  larger  than  the  non-feedback 
capacity: 


II.  Problem  formulation 

The  single-user,  discrete-time,  colored  Gaussian  noise  channel 
is  described  by  the  equation  relating  the  transmitted  signal 
X[n]  to  the  received  signal  Y[n]  at  time  n: 

Y[n]  =  X[n ]  +  Z[n ]  (1) 

where  Z[n\  is  the  Gaussian  noise  at  time  n.  We  will  consider 
transmission  over  this  channel  for  N  time  steps  and  assume 
that  Z[  1],  •  ■  •  ,  Z[N]  are  jointly  Gaussian  and  zero-mean  with 
covariance  matrix  Kz  .  Without  feedback,  V[n]  is  only  a  func¬ 
tion  of  the  message  U  to  be  transmitted;  however  with  feed¬ 
back,  X[n]  may  also  depend  on  past  values  of  the  noise  process 
2111,  •  •  •  ,  Z\n  —  ll.  The  transmitter  is  power-constrained  by 
Sir  *[*[»]»]  <NP. 

III.  Previous  work 

Let  Cn,fb{P)  be  the  capacity  in  bits  per  transmission  over 
N  time  steps  if  feedback  is  available  and  the  transmitter  is  con¬ 
strained  to  an  average  power  P.  The  expression  for  the  feed¬ 
back  capacity  and  the  form  of  the  capacity-achieving  feedback 
strategy  are  determined  as  follows  in  [1]: 

Cn,fb(P)=  max  _  -L  log  { \  (2) 

tr(Kx)<N  P  2N  (  \Kz\  ) 

xThis  work  was  conducted  under  Lincoln  Laboratory  contract 
BX-7036. 


Theorem  1  For  any  noise  covariance  matrix  Kz  of  a  colored 
noise  noise  process,  let  F  —  Fllse  be  the  linear  least  squares 
prediction  matrix.  The  difference  between  the  feedback  and  the 
non-feedback  capacity  is  then  bounded  below  by: 

Cn,fb(P)  -  C„(P)  >  1  log  {l  +  }  >  0 

(4) 

In  this  feedback  strategy,  the  linear  least  squares  prediction 
of  the  noise  is  added  to  the  information  part  of  the  signal 
to  form  the  transmitted  signal.  Note  that  this  prediction  is 
known  at  the  transmitter,  but  not  at  the  receiver.  And  it  turns 
out  that  this  strategy  gives  the  receiver  added  information 
about  the  noise.  It  is  surprising  that  whitening  the  effective 
noise  process  encountered  by  the  information  signal  is  not  a 
beneficial  strategy.  The  new  lower  bound  provided  can  be 
further  tightened  by  introducing  an  amplification  factor  and 
by  considering  F  =  a  Fllse,  where  a  >  1.  The  tightest 
bound  achievable  through  our  family  of  strategies  is  obtained 
when  increasing  a  until  the  assumption  of  large  enough  power 
is  violated. 
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Abstract  —  The  new  signature  scheme,  traitor  trace¬ 
able  signature  scheme  is  presented,  which  allows  the 
signer  to  convince  any  arbiter  of  the  recipient’s  in¬ 
fringement,  if  the  recipient  distributes  illegally  the 
signature  which  he  got.  We  use  the  techniques  of  a 
proof  of  knowledge  of  discrete  logarithm[l][2],  iden¬ 
tification  of  double  spender  in  an  off-line  electronic 
cash[3][4],  and  a  signcryption  scheme[5].  Our  scheme 
consists  of  3-move  and  it  is  more  compact  and  effi¬ 
cient  compared  with  the  previous  scheme[6],  due  to 
eliminate  the  cumbersome  cut-and-choose  like  tech¬ 
niques.  Moreover,  our  accusation  protocol  does  not 
require  the  private-key  of  the  recipient  of  signature, 
i.e.,  signer  can  convince  any  arbiter  of  the  recipient’s 
infringement  without  help  of  original  recipient'. 

I.  Introduction 

In  a  conventional  digital  signature  scheme,  after  issueing  the 
digital  document  with  his  signature,  the  signer  cannot  con¬ 
vince  anyone  who  has  leaked  his  signed  document,  since  he 
can  reproduce  it  arbitrarily.  Recently,  [6]  proposed  that  the 
technique  of  tracing  traitor[7][8]  could  be  applied  to  the  mes¬ 
sage  with  signature  in  order  to  prevent  illegal  proliferation  of 
it.  This  approach  is  effective  in  case  that  both  the  message 
and  signature  are  valuable  for  anyone. 

However,  this  method[6]  is  not  efficient  in  communication 
and  computation,  due  to  involve  the  cumbersome  cut-and- 
choose  like  technique.  Moreover,  [6]  has  the  following  two 
problems,  1)  an  accusation  protocol  requires  the  private-key 
of  the  recipient  of  signature.  Therefore,  if  the  recipient  is  not 
available,  the  arbiter  cannot  make  decision  of  accusation,  2) 
after  accusation  protocol,  the  signer  can  know  the  complete 
signature  which  is  known  only  by  recipient  before  accusation. 
This  means  that  [6]  is  not  robust  against  signer  making  wrong 
accusations. 

II.  Traitor  traceable  signature 

In  this  paper,  we  propose  the  new  signature  scheme,  traitor 
traceable  signature,  which  solves  several  problems  of  [6]  :  1) 
if  the  recipient  distributes  illegally  the  signature  which  he 
got,  our  scheme  allows  the  signer  to  convince  any  arbiter 
of  the  recipient’s  infringement,  2)  We  use  the  technique  of  a 
proof  of  knowledge  of  discrete  logarithm,  identification  of  dou¬ 
ble  spender  in  an  off-line  electronic  cash,  and  a  signcryption 
scheme,  which  are  well  estimated  to  be  (provably)  secure,  3) 
our  scheme  consists  of  3-move  and  it  is  more  compact,  and  effi¬ 
cient  compared  with  the  previous  scheme[6],  due  to  eliminate 


the  cumbersome  cut-and-choose  like  techniques,  4)our  accusa¬ 
tion  protocol  does  not  require  the  private-key  of  the  recipient 
of  signature  (the  signer  can  convince  any  arbiter  of  the  recip¬ 
ient’s  infringement  without  help  of  original  recipient),  5)the 
signature  can  be  generated  to  the  recipient  only  once  per  each 
execution  of  this  protocol  in  order  to  prevent  the  signer  from 
making  wrong  accusations. 

Table  1:  Traitor  Traceable  Signature  Scheme 

Alice  Bob 

x  €  R  zq 
X  <-  91 

(x)  4-  <A\  sigA  (X))  (x), 


verify  inif^  (i) 

1  «  r2  •  u  1  €r  z<? 


Ho 


V1 


v2 


■j  —  t  H  (  R  j  )  mod  q 
2  —  jH[R 2)  mod  q 
i  -  ;u]  mod  q 
H  (/ ,  J,  ,  H2, 

O-l  ,  <T2,  U]  .  U,  ) 
«  —  j  u 2  mod  q 


W'(« 


eqR(i,  x) 


verify  reqB{i.x)  by 
Rj  I  /"(H,  ) 

R2  =  g<*  2  jB(R2) 

I  =  gv 1 Jv 1 

I  ~  VJU  2 
generate  (c,  r.  j)  by 

V1  -*  <-1  .  ‘-2 

r  =  KHfc2(m) 

3  =  x  /  (r  A-  xa  )  mod  q 
c  ~  Ekl  (rn) 


r)  =  </.  J,  R  j  , 
R  2  ,  <7  j  ,  2  • 

.  «2.  V) 

eqB(i.x) 

*-  ( W(i .  x). 

3igB(W(i,x))) 


obtain  ( c,  r,  3  .  t>2  ) 
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Abstract  —  An  efficient  traitor  tracing  scheme  for 
broadcast  encryption  is  proposed.  Its  security  de¬ 
pends  on  the  difficulty  of  discrete  logarithm  problem 
and  is  equivalent  to  ElGamal  public-key  cryptosys¬ 
tem  even  when  subscribers  collude.  The  proposed 
scheme  is  the  first  one  which  satisfies  all  the  following 
features:  The  tracing  algorithm  is  black  box  tracing; 
All  the  traitors  are  identified  from  a  captured  pirate 
decoder;  The  data  supplier  can  encrypt  the  contents 
such  that  only  a  specific  subset  of  subscribers’  de¬ 
coders  can  decrypt  it;  The  encryption  algorithm  is 
public-key. 

I.  Introduction 

A  broadcast  distribution  system  (BDS)  is  a  system  where 
the  data  supplier  broadcasts  the  contents  in  encrypted  form 
and  gives  each  subscriber  a  decoder  containing  a  secret  de¬ 
cryption  key,  e.g.,  pay-TV.  For  a  BDS,  there  are  two  require¬ 
ments:  At  least  one  traitor  can  be  identified  from  a  captured 
pirate  decoder  which  is  constructed  by  t  or  less  traitors;  The 
data  supplier  can  prevent  s  or  less  subscribers  from  decrypt¬ 
ing  the  broadcasted  contents  in  encrypted  form.  Here,  s  and 
t  are  parameters.  To  the  authors’  knowledge,  only  the  BDS 
in  [1]  satisfies  both  s  >  0  and  t  >  0,  while  it  is  not  efficient. 
We  construct  an  efficient  BDS  by  limiting  its  security  and  the 
parameters  to  computationally  theoretical  one  and  s  —  t,  re¬ 
spectively.  The  proposed  BDS  is  based  on  the  BDS  with  s  =  0 
and  t  >  0  in  [2]  and  uses  the  idea  of  a  group  key  distribution 
scheme  in  [3]  to  make  s  >  0.  Compared  with  the  BDS  in  [2], 
the  proposed  BDS  can  identify  traitors  even  if  a  captured  pi¬ 
rate  decoder  is  used  only  as  a  black  box.  Note  that,  for  a  group 
key  distribution,  there  is  no  need  to  trace  traitors.  Actually, 
that  is  not  discussed  in  [3]. 

II.  Proposed  System 

We  label  all  the  n  subscribers  from  1  to  n.  The  set  of  all 
the  n  subscribers  is  denoted  by  <£,  i.e.,  $  =  {l,2,...,n}.  Let 
p  and  q  be  prime  numbers  with  q\p  —  1.  The  multiplicative 
group  of  order  p  —  1  is  denoted  by  Z*.  Let  g  be  a  q-th  root 
of  unity,  and  Gq  denote  a  subgroup  of  of  order  q,  i.e., 
Gq  =  {gz  :  0  <  z  <  q}.  Let  Iq  denote  the  set  of  nonnegative 
integers  less  than  q.  All  the  subscribers  and  the  data  supplier 
agree  on  the  prime  numbers  p,  q  and  the  generator  g. 

The  secret  decryption  key  for  the  subscriber  i  is  d;  = 
(i,  /(*)),  where  f(x)  =  ao  +  aix  +  a^x2  +  •  ■  •  +  atx1  mod  q 
with  ao,oi,. . .  ,at  €  The  encryption  key  is  e  =  (p,  5,3/0, 
3/i,  •  •  •  ,3/t),  where  y;  =  ga'  with  0  <  i  <  t. 

The  broadcasted  contents  in  encrypted  form  consists  of  an 
enabling  part  and  a  cipher  part.  The  cipher  part  is  the  sym¬ 
metric  encryption  of  the  contents  under  a  session  key.  For 
each  distribution,  a  session  key  is  chosen  randomly.  Let  A 
denote  the  set  of  t  or  less  subscribers  whom  the  data  sup¬ 
plier  prevents  from  obtaining  the  contents  encrypted  under 


the  session  key  ks.  For  ks  and  A  =  {sti,X2,...  ,  X|a|},  the 
data  supplier  generates  the  enabling  part,  denoted  B(k3,  A), 
(9T,ksyTo,(xi,9rf{xi)),(x2,gTHx2)),  ...  ,{xt,grnxt)))  where  r 
is  a  random  element  in  Iq  and  relatively  prime  to  p—  1,  and  ev¬ 
ery  Xi  with  |A|  <  i  <  t  is  chosenjrom  /,  \  (<FU{0}).  grBxi)  Can 

be  computed  from  e  and  r  by  gTf'~Xi’  =  (y0  x  y*'  x  y2'  x  -  -  -  x 


For  B(ks,  A),  only  a  subscriber  m  with  m  £  A  car.  compute 
grB°)  by  performing  Lagrange  interpolation  formula  for  f(x) 
implicitly  in  the  exponent  of  gT ,  and  obtain  ks. 

Suppose  a  pirate  decoder  constructed  by  a  coalition  C  of  t  or 
less  traitors  is  captured.  Even  if  the  pirate  decoder  is  used  only 
as  a  black  box,  the  traitors  can  be  identified  as  follows:  For 
every  set  of  t  subscribers,  denoted  by  A,  generate  the  enabling 
part  B(k3,  A)  where  ks  is  taken  over  Gq  at  uniformly  random; 
Give  every  generated  enabling  part  to  the  pirate  decoder;  Sup¬ 
pose  that  the  pirate  decoder  does  not  output  the  session  key 
for  the  l  enabling  parts,  B(ks i,  Ai),  B(ks 2,  A2), . . .  ,  A;); 

The  set  of  all  the  traitors  is  (~|i=i  under  the  same  assump¬ 
tion  that  in  [1]  where  a  pirate  decoder  outputs  the  contents  as 
long  as  the  input  is  an  enabling  part  and  at  least  one  traitor  is 
not  in  A.  The  reason  is  that  CCA;  for  every  Ai  with  1  <.  i  <  l 
and  there  is  no  C'  such  that  C  C  C  C  Ai  with  1  <  i  <  l  under 
the  above  assumption. 

III.  Discussion  on  Efficiency  and  Security 

When  evaluating  a  BDS,  two  complexity  measures  are  to 
be  considered:  the  size  of  an  enabling  part  and  that  of  a  secret 
decryption  key.  The  enabling  part  consists  of  2f  +  2  elements 
in  Z*  and  each  secret  decryption  key  consists  of  two  elements 
in  Z* .  The  proposed  system  is  much  more  efficient  than  the 
BDS  in  [1]  where  those  are  0(t2)  and  0(f6),  respectively,  and 
as  efficient  as  the  most  efficient  system  in  [2]  where  those  are 
t  +  1  and  1,  respectively. 

Even  if  the  encryption  key  e  is  made  public,  for  every  A 
with  |A|  <  t,  the  computational  complexity  for  A  to  compute 
ks  is  shown  to  be  as  hard  as  to  compute  a  plaintext  in  El¬ 
Gamal  public-key  cryptosystem  over  Z *.  The  computational 
complexity  for  C  to  obtain  a  secret  decryption  key  (u,  f(u)) 
where  u  £  C,  when  given  e  and  traitors’  secret  decryption  keys 
di  with  i  €  C,  is  shown  to  be  as  hard  as  the  discrete  logarithm 
problem  over  Z * . 
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Abstract  —  We  say  that  a  broadcast  encryption 
scheme  is  a  (c,  iV)-Broadcast  Exclusion  Scheme  (BEx) 
if  a  center  can  exclude  c  or  less  users  among  N  users. 
In  this  paper,  we  present  an  efficient  (c,  AT)-BEx  which 
has  inherently  large  traceability  by  showing  a  new 
construction  of  cover  free  families. 

I  Previous  works 

Assume  that  there  exist  secure  block  ciphers.  It  was  recently 
shown  that  a  (c,  A)-BEx  is  obtained  from  a  cover  free  family 
by  Kumar  et  al.  [1]  and  the  authors  [2]  independently.  Kumar 
et  al.  also  presented  a  construction  of  cover  free  families  such 
that  overhead  =  0(c2),  which  is  independent  of  N,  by  using 

algebraic  geometry  codes,  where  overhead  =(the  length  of  a 
ciphertext)/(the  length  of  a  plaintext). 

A  set  system  is  a  pair  ( X ,  B),  where  X  =  {1,2,...,  v}  and 
B  is  a  set  of  blocks  Bi  C  X  with  i  =  1,2,...,  A^.  We  consider 
a  set  system  such  that  |J5;|  =  k  for  i  =  1,  2, . . .  ,  N . 

Definition  1.1  [3]  We  say  that  {A',  6}  is  a  ( v,  N,k,c,  D)- 
cover  free  family  if  |B,n  \  |JC=i  I  >  /or  V£J, , , . . . ,  WBlc 
and  for  VB,n  £  . . ,  Bic}. 

On  the  other  hand,  a  broadcast  encryption  scheme  is  said 
to  have  c-traceability  if  when  a  set  of  at  most  c  users  (who  are 
not  necessarily  excluded)  pool  their  keys  together  to  construct 
a  “pirate  decoder”,  at  least  one  of  the  users  (a  traitor)  involved 
can  be  identified  from  the  decoder  [4]. 

Definition  1.2  [5]  We  say  that  (X,  B)  is  a  c-(v,  N ,  k)  trace¬ 
able  set  system  if  for  VB;, , . . .  ,  VB;C  and  for  VB,0  £ 
{Bii ,  ■  •  • ,  Bi,.}, 

|Fn  Bia\  <  max|FnB;  |  (1) 

l<j<c 

for  any  F  C  UL,*i  such  that  |F|  =  k. 

In  the  c-traceability  scheme,  Bi  is  the  key  of  user  i  and 
F  corresponds  to  the  pirate  key.  From  Eq.(l),  we  see  that  a 
traitor  is  detected  by  computing  max,  |Fn  B,  |. 

II  Proposed  construction 

In  [1,  page  614],  it  was  remarked  that  cover  free  families  could 
be  used  to  construct  traceability  schemes.  Actually,  we  can 
prove  the  following  theorem. 

Theorem  II. 1  If  there  exists  a  (v,  N ,  k ,  c,  D) -cover  free  fam¬ 
ily,  then  there  exists  a  ( c,N)-BEx  such  that  overhead  =  v/D. 
Further,  if  k  <  D  +  \D/c\,  then  it  can  be  used  as  a  c- 
traceability  scheme  as  well. 

part  of  this  research  was  funded  by  NSF  CCR-9903216. 


In  this  section,  we  show  a  construction  of  cover  free  families 
which  satisfy  both  overhead  =■  0(c2)  and  k  <  D  +  \D/c\  by 
using  almost  strongly  universal  hash  functions.  Even  for  BExs 
only,  our  construction  is  conceptually  much  simpler  and  much 
easier  than  that  of  [1]. 

Let  X  and  Y  be  finite  sets  such  that  |Aj  >  |Y|.  Let  H  be 
a  set  of  functions  such  that  h  :  X  — *  Y  for  each  h  £  H.  Let 
\H\  =  v ,  1*1  =  I Y|  =  n. 

Definition  II. 1  [6]  We  say  that  H  is  an  e-almost  strongly 
universal  (e-ASU2(v,m,n))  hash  function  family  provided  that 
the  following  two  conditions  are  satisfied: 

1.  for  any  x  £  X  and  any  y  €  Y ,  there  exist  exactly 
|R|/|Yj  functions  h  6  H  such  that  h(x)  =  y. 

2.  for  any  two  distinct  elements  xi,X2  €  X  and  for  any 
two  (not  necessarily  distinct)  elements  y\,y2  €  Y,  there 
exist  at  most  t\H\/\Y\  functions  h  £  H  such  that 
h{xi )  =  y,,  i  =  1,  2. 

Theorem  II. 2  If  there  exists  an  e-ASUi(v,  m,n)  hash  func¬ 
tion  family  H ,  then  there  exists  a  ( v ,  N,  k,  c,  D) -cover  free  fam¬ 
ily  such  that  N  =  mn,  k  —  v/n ,  D  =  ^(1  —  ce)  +  1. 

Theorem  II. 3  There  exists  a  e-ASU2(v,m.,n)  hash  function 
family  such  that  v  —  ql+2,  n  =  q,  m  =  qlq  and 


for  1  <  VI  <  V/  <  Vq  =  prime  power. 

Corollary  II.  1  There  exists  a  ( v ,  N,  k,  c,  D)-cover  free  family 

such  that  v  =  q(+2,  N  =  qiq  +I ,  k  —  qi+1  and  D  =  q,+1  — 

c(lql  +  q(+1  —  q)  +  1  for  1  <  Vf  <  V/  <  Vq  =  prime  power. 

Corollary  II. 2  Let  q  be  a  prime  power  and  let  c  =  y/q/3. 

Then  there  exists  a  (c,  N)-BEx  such  that  overhead  —  0(c2). 

Further,  it  can  be  used  as  a  c-traceability  scheme  as  well. 

(Proof)  Let  t  =  1  and  l  —  2. 
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Abstract  —  It  is  shown  that  oblivious  transfer  of 
strings  can  be  reduced  to  the  weakest  version  of  obliv¬ 
ious  bit  transfer,  where  the  receiver  can  choose  to  ob¬ 
tain  arbitrary  (but  incomplete)  information  about  the 
pair  of  bits  sent.  This  solves  an  open  problem  posed 
by  Brassard  and  Crepeau. 

I.  Equivalence  Between  Oblivious  Transfers 

Important  cryptographic  primitives,  such  as  secure  mes¬ 
sage  transmission,  key  agreement,  or  secure  multi-party  com¬ 
putation,  can  often  be  reduced  to  apparently  much  weaker 
primitives  such  as  noisy  communication  channels  or  correlated 
randomness.  In  this  note  we  present  an  information-theoretic 
reduction  of  so-called  l-out-of-2  oblivious  string  transfer  to  a 
weak  variant  of  oblivious  bit  transfer,  called  universal  obliv¬ 
ious  transfer.  Oblivious-transfer  primitives  are  of  central 
importance  for  many  cryptographic  protocols.  In  principle, 
oblivious  transfer  allows  for  carrying  out  any  secure  two-party 
computation. 

The  standard  oblivious  bit  transfer  (bit  OT)  between  two 
parties  corresponds  to  a  binary  erasure  channel  with  erasure 
probability  1/2:  The  sender’s  input  is  a  bit  b,  which  the  re¬ 
ceiver  learns  with  probability  1/2,  whereas  otherwise,  he  ob¬ 
tains  no  information  about  b.  The  sender  on  the  other  hand 
does  not  learn  whether  the  bit  has  been  received  or  not. 

In  l-out-of-2  bit  OT  ((j)-OT  for  short)  the  sender  sends 
two  one-bit  messages,  exactly  one  of  which  the  receiver  can 
choose  to  read,  remaining  completely  ignorant  about  the  other 
one,  such  that  the  sender  does  not  get  any  information  about 
which  message  has  been  chosen.  In  l-out-of-2  k-bit-string  OT 
((j)-OTfc)  the  messages  are  fc-bit  strings  instead  of  single  bits. 

The  problem  of  reducing  string  OT  to  bit  OT  was  studied 
by  many  authors  (see  [1]  and  the  references  therein).  In  [1], 
a  reduction  was  presented  based  on  so-called  privacy  ampli¬ 
fication  by  hashing  with  linear  functions.  It  was  even  shown 
that  (j)-07*  with  security  s,  i.e.,  such  that  with  probabil¬ 
ity  at  least  1  —  2~s ,  the  receiver  obtains  no  information  at 
all  about  one  of  the  transmitted  strings,  even  when  given  the 
other,  can  be  reduced  to  n  =  0{k  +  s)  realizations  of  gener¬ 
alized  OT  (GOT),  where  the  receiver  can  choose  to  learn  any 
one-bit  function  (such  as  bo,  bo  ©hi,  or  6o  A&i)  about  the  two 
bits  bo  and  b i  sent.  Protocol  BC,  which  achieves  this,  works 
as  follows.  First,  GOT  is  applied  n  times  with  random  input 
bits  (xi,yi).  Then,  the  two  fc-bit  messages  mo  and  mi  to  be 
sent  by  (j)-OTfc  are  blinded  by  (i.e.,  xor-ed  with)  two  fc-bit 
strings  ho(xi, . . .  ,xn)  and  h\(y\, . . .  ,yn),  respectively,  where 
ho  and  h\  are  two  linear  functions  from  n-bit  to  fc-bit  strings, 
chosen  randomly  and  published  by  the  sender. 

It  was  stated  as  an  open  problem  in  [1]  how  this  result 
generalizes  to  a  primitive  offering  the  receiver  the  possibil¬ 
ity  to  obtain  arbitrary  (probabilistic)  information  about  the 
pair  (bo,bi).  We  show  that  the  most  optimistic  answer  is  the 
correct  one:  Whenever  the  information  the  receiver  obtains 
in  such  universal  OT  (UOT)  does  not  completely  determine 
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(bo,bi),  then  string  OT  can  be  reduced  to  this  primitive.  The 
argument  is  based  on  the  fact  that  among  all  types  of  an  ad¬ 
versary’s  side  information  about  a  single  bit  with  given  error 
probability,  there  exists  a  “strictly  worst  case,”  namely  infor¬ 
mation  obtained  from  a  symmetric  erasure  channel.  This  also 
allows  for  simplifying  the  proofs  given  in  [1]  and  for  improving 
the  results  with  respect  to  the  involved  constants.  Related  re¬ 
sults  in  models  different  from  the  one  of  [1]  were  shown  in  [2], 

II.  The  Power  of  Universal  OT 

Definition  1.  Let  a  >  0.  In  universal  OT  with  parameter 
a  (q-UOT),  the  sender’s  input  is  a  pair  of  bits  (6o,6i).  The 
receiver  specifies  a  possibly  probabilistic  function  f l  which 
must  satisfy  H((bo,bi)  \  Cl(bo,bi))  >  a  if  (bo,  bi)  is  uniformly 
distributed.  Then  the  receiver  obtains  fl(bo,bi),  but  no 
additional  information  about  (6o,6i).  The  sender  on  the 
other  hand  does  not  learn  anything  about  Q. 

Theorem  1.  Protocol  BC  reduces  (j) -OT*  with  security 
s  to  at  most  f(s  +  2fc)ln2/pe]  realizations  of  a-UOT, 
where  pe  is  the  unique  solution  (<  1/2/  to  the  equation 
h(2x)  +  2x  log  3  =  a. 

Lemma  2.  Let  B  be  a  symmetric  binary  random  variable, 
and  let  U  be  a  random  variable  such  that  B  and  U  have  joint 
distribution  Pbu  ■  Let  p  be  the  average  error  probability  of 
guessing  B  when  given  U,  using  the  optimal  guessing  strategy. 
Then  there  exists  a  random  variable  V  with  the  following 
properties.  First,  V  =  {0,1,  A}  and  Pv( A)  =  2 p  hold,  and 
for  every  ueU,  we  have  Pb\u=u,v=&(0)  =  Pb\u=u,v=a(1)- 

Proof.  Let  v  6  U,  and  assume  that  a  —  Pb\u=u( 0)  > 
Pb\u=u(1)  =  b.  Let  V  be  defined  by  Pv\b=o,u=JS>)  = 
(a  —  b)/a,  iV|B=o,c/=u(A)  =  b/a,  and  Pv\b=i,u=u(A)  =  1. 
Note  that  PV^U=U(A)  =  26,  i.e.,  twice  the  error  probability 
for  guessing  B  when  given  U  =  u.  □ 

The  idea  of  the  proof  of  Theorem  1  is  as  follows.  By  Fano’s 
inequality,  one  can  conclude  that,  when  the  pair  ( Xi,yi )  is  sent 
by  a-UOT,  about  at  least  two  of  the  bits  Xi,  yi,  and  x,  (&  y,, 
the  receiver’s  error  probability  when  guessing  the  bit  with  the 
optimal  strategy  is  at  least  pe.  Because  of  Lemma  2,  we  can 
assume  that  with  probability  at  least  2 pe,  the  receiver  has  no 
information  at  all  about  such  a  bit.  By  construction,  this  im¬ 
plies  that  with  overwhelming  probability,  the  receiver  cannot 
bias  g(ho(xi, . . . ,  xn),  7ii(j/i, . . . ,  yn))  for  any  linear  function  g 
with  range  {0, 1}  and  depending  non-trivially  on  both  inputs. 
In  this  case,  he  has  no  information  at  all  about  one  of  the 
inputs,  even  when  given  the  other. 
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Abstract  —  This  paper  considers  “vector  multiple 
access  channels”  (VMAC)  where  each  user  has  mul¬ 
tiple  “degrees  of  freedom”  and  studies  the  effect  of 
power  allocation  as  a  function  of  the  channel  state  on 
the  “sum  capacity”  defined  as  the  maximum  sum  of 
rates  of  users  per  unit  degree  of  freedom  at  which  the 
users  can  jointly  reliably  transmit,  in  an  information 
theoretic  sense.  A  concrete  example  of  a  VMAC  is 
a  MAC  with  multiple  antennas  at  the  receiver  where 
the  antennas  provide  spatial  degrees  of  freedom.  Our 
main  result  is  the  identification  of  a  simple  dynamic 
power  allocation  scheme  that  is  optimal  in  a  large  sys¬ 
tem,  i.e.,  in  the  regime  of  a  large  number  of  users  and 
a  correspondingly  large  number  of  antennas.  A  key 
feature  of  this  policy  is  that,  for  any  user,  it  depends 
only  on  the  instantaneous  amplitude  of  the  slow  fad¬ 
ing  component  of  the  vector  channel  of  that  user  alone 
and  the  structure  of  the  policy  is  “waterfilling”. 

I.  Introduction  and  Problem  Statement 

A  discrete  time  baseband  frequency  flat  channel  fading  model 
for  the  multiple  antenna,  multiple  access  channel  is  the  fol¬ 
lowing: 

K 

y(n)  =  ^2Xi(n)h’(n)h{(n) +  w(n)  . 

i  =  l 

Here  K  denotes  the  number  of  users  and  n  the  channel  use 
instant.  The  user  symbols  are  denoted  by  x,  and  y(n)  is  the 
received  signal  (thought  of  as  a  A  dimensional  vector,  N  be¬ 
ing  the  number  of  antennas  at  the  receiver)  at  the  antenna 
array  at  time  instant  n.  Here  w(n)  is  an  additive  white  Gaus¬ 
sian  noise.  The  channel  (a  vector  with  N  components)  from 
user  i  to  the  antenna  array  at  time  instant  n  is  written  as 
h{  ( n  )hf(  n).  Here  h ;  is  a  scalar  that  varies  slowly  in  time  and 
captures  the  distance  loss  and  the  shadowing  effects  and  thus 
depends  only  on  the  user.  The  fast  fading  component  which  is 
changing  due  to  the  destructive  and  constructive  additions  of 
the  signals  from  multiple  paths  is  represented  by  the  vector  h/ 
which  depends  on  the  individual  antenna  elements.  For  the 
purpose  of  this  summary,  we  will  assume  that  {fi’(n)}n  and 
w  (n) }  are  independent  stationary  and  ergodic  processes. 
We  are  interested  in  the  scenario  of  coherent  communication, 
the  scenario  when  the  receiver  is  able  to  track  the  channel 
variations  reliably. 

Our  performance  measure  is  the  long  term  sum  capac¬ 
ity:  sum  of  rates  at  which  users  jointly  reliably  communicate. 
These  rates  are  time  averaged  with  a  power  constraint  on  the 
users  which  is  also  averaged  in  time.  We  are  interested  in 

'This  work  was  supported  by  NSF  under  grant  IRI 97-12131  and 
by  an  NSF  CAREER  Award  under  grant  NCR  97-34090 


the  characterizing  sum  capacity  with  and  without  feedback 
of  channel  states  to  the  users.  If  there  is  no  feedback  to  the 
users,  then  a  coding  theorem  shows  that  the  users  transmit 
at  constant  power.  When  there  is  feedback  information  of  the 
channel  state,  users  can  modulate  their  power  based  on  this 
knowledge.  The  problem  addressed  here  is  the  characteriza¬ 
tion  of  the  power  allocation  policies  that  are  optimal  in  the 
sense  of  maximizing  sum  capacity  of  the  system. 

II.  Main  Result 

In  the  one  antenna  scenario,  there  is  a  simple  characteriza¬ 
tion  of  the  optimal  power  policy  (only  the  user  with  the  best 
channel  is  allowed  to  transmit  and  this  user  uses  a  water¬ 
filling  power  policy)  and  the  gap  between  sum  capacity  by 
using  this  optimal  policy  and  the  sum  capacity  with  no  chan¬ 
nel  state  feedback  is  very  large  (unbounded  in  the  number 
of  users).  However,  in  the  general  case  of  multiple  antennas, 
there  is  no  closed  form  solution  to  the  optimal  power  policy 
which  for  any  user  is  some  function  of  the  paths  from  all  the 
users  to  the  antenna  array.  Our  main  result  below  identifies 
a  simple  waterfilling  power  allocation  policy  that  is  optimal 
in  the  regime  of  large  number  of  users  and  antennas:  con¬ 
sider  the  power  policy  that  for. any  user  depends  only  on  the 
slow  fading  component  of  that  user  alone  and  the  structure 
is  that  of  “waterfilling”.  Observe  that  in  the  regime  when 
slow  fading  is  constant  over  the  time  scale  of  communication, 
this  policy  simply  allocates  constant  power.  Our  main  result 
is  that  this  is  a  very  good  approximation  to  the  .complicated 
optimal  power  policy.  In  particular,  this  means  that  feeding 
back  only  the  slow  fading  component  is  asymptotically  suffi¬ 
cient  in  the  multiple  antenna  scenario.  Denoting  the  ratio  of 
users  to  antennas  by  [aAJ,  we  have  our  main  result  below. 

Theorem  1  For  all  a,  for  all  SNR  levels,  for  all  fading  dis¬ 
tributions, 

lim  sup  V ~N  ( Sum  Capacity  with  optimal  power  policy 

N  —*  oo 

—  Sum  Capacity  with  waterfilling  policy)  <  oo 
The  details  of  this  summary  are  available  in  [1], 
References 
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minor  revisions,  Feb  2000.  Also  available  as  UCB/ERL  Memo¬ 
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I.  Introduction 

We  consider  a  spread  spectrum,  multi-user  channel,  with 
an  antenna  array  at  the  receiver,  and  independent  flat 
fading  to  each  antenna  from  each  user.  We  focus  on 
the  case  of  microdiversity,  and  show  that  a  curious  phe¬ 
nomenon  of  “resource  pooling”  arises. 

II.  Basic  Multi-Antenna  Model 

We  consider  a  sampled  discrete-time  baseband  model  for 
a  symbol-synchronous  multi-access  CDMA  system  with 
K  users,  L  receive  antennas  and  processing  gain  N.  The 
received  signal  at  the  1th  antenna  is  given  by 

K 

Y(l)  =  J2xkVnyk(l)sk+W{l),  (1) 

)t= i 

where  Xk  is  the  symbol  transmitted  by  user  k  at  transmit 
power  Tfc,  is  the  complex  fading  channel  gain  from 

user  k  to  antenna  l,  s*  €  CN  is  the  signature  sequence 
of  user  k ,  Y (l)  €  C,v ,  and  W (/)  is  additive  white  Gaus¬ 
sian  noise  with  variance  a2,  independent  across  l.  The 
symbol  energy  E[A,2]  is  normalized  to  be  1.  Here,  we  are 
assuming  a  flat  fading  channel  model,  and  the  channel 
gains  are  assumed  to  be  circular  symmetric,  as  is  typ¬ 
ical  for  a  baseband  model.  Furthermore,  we  make  the 
additional  assumption  of  “microdiversity”:  we  assume 
that  the  marginal  distributions  of  the  7/t(/)s  are  identi¬ 
cal,  across  both  antennas,  and  users.  We  will  also  allow 
the  transmit  powers  T^s  to  depend  on  the  magnitudes  of 
the  channel  gains  7k(/)  for  all  k  and  l,  but  independent 
of  everything  else.  This  models  the  use  of  power  control. 

The  optimal  linear  receiver  is  known  as  the  MMSE  re¬ 
ceiver.  While  an  explicit  expression  for  the  SIR  of  the 
MMSE  is  well  known,  we  obtain  more  insight  by  proving 
an  asymptotic  result  as  the  system  grows  large,  under 
randomly  selected  signature  sequences:  assume  that  the 
chip  values  of  the  sequences  are  i.i.d.  circular  symmet¬ 
ric  complex  Gaussian  random  variables  with  mean  zero 
and  variance  1  /N,  and  the  sequences  of  different  users  are 
chosen  independentliy. 

Supported  by  an  Australian  Research  Council  Small  Grant 
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III.  Main  Result 

Theorem  1  Let  Pk  =  Tk  |7fc(0|2  the  sum  of  the 

received  powers  of  user  k.  Assume  that  almost  surely  the 
empirical  distribution  of  (Pi , . . .  ,  Pk  )  converges  weakly  to 
a  limiting  distribution  F  as  N  goes  large,  and  that  the  PkS 
are  uniformly  bounded  for  all  k  and  N  1 .  Then  if  N,K  — t 
oo  with  K/N  — >  q  but  L  fixed,  SIRi/Pi  converges  in 
probability  to  a  deterministic  constant  a,  where  a  is  the 
unique  positive  solution  to  the  fixed-point  equation: 


and  P  is  a  random  variable  having  distribution  F . 

Proof  See  [2],  o 

This  result  says  that  in  a  wideband  system  with  many 
users,  the  SIR  of  a  user  does  not  depend  on  the  specific 
realization  of  the  signature  sequences,  the  channel  gains 
and  the  transmit  powers.  The  SIR  is  a  function  of  the 
user’s  own  received  powers  at  the  antennas  and  depends 
on  the  the  interferes’  received  powers  only  through  the 
limiting  empirical  distribution  of  the  P*,s.  In  a  sense, 
there  is  an  averaging  of  the  effects  across  the  large  num¬ 
ber  of  interferes.  Furthermore,  by  comparing  this  result 
with  our  main  result  in  [1],  we  see  that  the  multiantenna 
system  here  is  behaving,  asymptotically,  just  like  a  single 
antenna  system  with  users  per  degree  of  spreading  of  jf, 
and  received  power  being  the  pooled  received  power  from 
the  multi-antenna  system. 
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Abstract  —  The  design  of  binary  power  control  al¬ 
gorithms  for  cellular  communication  systems  is  con¬ 
sidered  in  context  of  code  division  multiple  access 
(CDMA).  The  control  problem  is  posed  as  a  tradeoff 
between  the  desire  for  users  to  meet  their  signal-to- 
interference  ratio  (SIR)  requirements  and  the  need  to 
minimize  the  transmitted  signal  energy  over  the  dura¬ 
tion  of  their  calls.  The  dynamic  nature  of  the  wireless 
channel  for  mobile  users  is  incorporated  in  the  prob¬ 
lem  definition.  Based  on  dynamic  programming  argu¬ 
ments,  an  optimal  single-user  solution  is  obtained. 

I.  Introduction 

The  analysis  of  power  control  for  wireless  multi-access  sys¬ 
tems  has  been  well  documented  over  the  past  decades,  with 
new  contributions  often  motivated  by  the  need  to  address 
practical  issues.  Much  of  the  work  on  uplink  power  control 
for  CDMA  systems  has  been  focusing  on  static  channel  mod¬ 
els,  i.e. ,  models  in  which  the  channel  gain  of  every  user  is 
assumed  constant.  The  performance  results  obtained  under 
this  assumption  will  be  valid  as  long  as  the  reaction  time  of 
the  power  control  algorithm  is  small  compared  to  the  coher¬ 
ence  time  of  the  underlying  wireless  channel.  In  other  words, 
the  transmitted  power  of  each  user  is  implicitly  assumed  to 
converge  to  its  optimal  level  before  any  significant  change  oc¬ 
curs  in  the  channel  state.  We  propose  a  different  approach  to 
the  design  of  power  control  algorithms  and  include  a  dynamic 
stochastic  channel  model  as  part  of  the  problem  definition. 

II.  Problem  Formulation 

We  address  the  control  problem  as  a  tradeoff  between  the 
desire  for  users  to  meet  their  SIR  requirements  and  the  need 
to  minimize  transmitted  energy  over  time  intervals  in-between 
control  signals.  Effectively,  this  formulation  leads  to  a  trade¬ 
off  between  the  user  capacity  of  the  overall  system  and  the 
link  quality  of  individual  users.  We  wish  to  solve  this  design 
problem  using  dynamic  programming.  To  cast  the  problem 
into  a  dynamic  programming  framework,  we  need  to  develop 
a  discrete-time  model  for  the  underlying  wireless  channel,  and 
to  define  an  appropriate  cost  function. 

We  adopt  the  standard  tap-delay  line  channel  model  [5] 
and  assume  the  channel  gains  to  vary  slowly  with  respect  to 
the  time  interval  in-between  control  signals.  In  dealing  with 
slowly  varying  channel  gains,  it  is  convenient  to  develop  an 
equivalent  discrete-time  channel  model  for  the  analog  system. 
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REER/PECASE  grant  under  CCR-9733204,  and  by  the  Office  of 
Naval  Research  under  grant  N0014-97-1-0823.  J.-F.  Chamberland 
was  also  supported  by  a  Fonds  FCAR  fellowship. 
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After  maximal  ratio  combining,  the  discrete-time  channel  gain 
G[  ]  becomes  a  function  of  the  gain  coefficients  {Ei[-\\i=l },  and 
is  given  by  G[fc]  =  'Yh't=l  \E([k]\2,  where  L  is  the  number  of 
resolvable  paths. 

We  define  the  individual  cost  g  as  a  function  of  the  target 
SIR  7,  and  the  actual  SIR  at  the  receiver  7.  When  7  is  below 
the  target  SIR  7,  a  fixed  cost  is  incurred,  which  accounts  for 
the  user  not  meeting  the  target  SIR  requirement.  Otherwise, 
the  SIR  at  the  output  of  the  receiver  exceeds  the  target  SIR 
and  we  make  the  cost  function  proportional  to  the  excess  of 
transmitted  energy.  The  cost  per  stage  function  captures  the 
tradeoff  between  the  desire  for  users  to  meet  their  SIR  require¬ 
ments  and  the  need  to  minimize  the  transmitted  energy  over 
the  control  period. 

We  pose  the  optimization  problem  as  a  discounted  cost  in¬ 
finite  horizon  problem  [4].  -The  discounting  factor  a  <  1  re¬ 
flects  the  uncertainty  on  the  time  duration  of  a  call  and  our 
level  of  confidence  in  the  accuracy  of  the  channel  parameters 
over  time.  Given  an  initial  state  xo  (channel  plus  transmitted 
power)  and  a  discounting  factor,  we  want  to  minimize  the  total 
cost  J(x 0)  =  lim/v->oc  E  afc<?(xfc)]  ■  Standard  dynamic 

programming  steps  lead  to  an  optimal  stationary  policy  which 
satisfies  Bellman’s  equation. 

III.  Discussion  and  Conclusion 

The  dynamic  programming  algorithm  yields  as  a  solution 
a  look-up  table.  Although  large  look-up  tables  are  hard  to 
implement,  this  solution  provides  an  upper  bound  on  the  per¬ 
formance  of  practical  systems.  For  instance,  the  performance 
of  the  best  threshold  policy  comes  close  to  that  of  the  dynamic 
programming  solution.  Thus,  the  technique  we  developed  can 
be  employed  to  assess  the  performance  of  simpler,  easily  im- 
plementable  algorithms. 
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Abstract  —  The  purpose  of  this  paper  is  to  provide  a 
framework  for  resource  allocation  and  admission  con¬ 
trol  in  a  DS-CDMA  system  in  which  there  are  sev¬ 
eral  traffic  classes  with  different  rates  and  quality  of 
service  requirements.  We  focus  on  uplink  (mobile  to 
base)  transmission,  in  which  the  transmissions  from 
different  mobiles  are  uncoordinated.  For  special  cases 
of  two  traffic  classes,  we  show  that,  for  large  systems, 
a  Gaussian  approximation  for  the  interference  yields 
that  the  boundary  of  the  admission  control  region  is 
approximately  a  straight  line  and  the  optimal  power 
ratio  P2/P1  is  roughly  the  same  throughout  the  bound¬ 
ary  of  the  admission  control  region. 

I.  Introduction 

Our  framework  is  based  on  the  following  assumptions: 

(a)  The  users  transmit  using  fixed-length  packets,  and  are 
assumed  to  be  synchronized  at  the  packet  level.  Thus,  the 
system  is  time-slotted,  with  a  slot  equal  to  a  packet  duration. 

(b)  The  traffic  generated  by  a  user  may  be  bursty  (e.g.,  voice, 
variable  bit  rate  video,  TCP).  However,  we  assume  that  the 
bit  rate  over  a  given  packet  is  constant,  which  means  that  the 
processing  gain  over  a  packet  is  fixed  (since  the  chip  rate  is 
fixed). 

(c)  The  event  of  packet  loss  for  a  given  user  is  well  approxi¬ 
mated  by  the  event  that  the  Signal-to-Interference-plus-Noise 
Ratio  (SINR)  falls  below  a  threshold. 

II.  System  Model 

We  consider  on-off  traffic  sources  here,  for  which  the  offered 
bit  rate  can  take  one  of  only  two  possible  values,  the  peak  rate 
and  zero.  For  an  on-off  source  of  traffic  class  i,  the  process¬ 
ing  gain  when  the  source  is  on  is  determined  by  its  peak  rate, 
and  is  denoted  by  Ni.  Our  purpose  is  to  determine  the  re¬ 
gion  determined  by  the  allowable  tuples  (Ki,  K2, ...),  where 
Ki  denotes  the  number  of  sources  of  type  i.  This  also  requires 
determining  the  optimal  values  of  the  received  powers  {Pi}, 
where  Pi  denotes  the  desired  received  power  for  a  user  of  type 
i.  Our  model  is  simpler  than  the  models  in  [1]  and  [2],  in  that 
we  allow  the  processing  gain  to  be  fixed  by  the  offered  rate, 
and  only  choose  the  received  powers  for  the  different  traffic 
classes.  This  enables  us  to  obtain  a  simpler  characterization 
of  the  admission  control  region. 

_ III.  Main  results 

1This  work  was  supported  by  the  National  Science  Foundation 
under  a  CAREER  award  NSF  NCR96-24008CAR  and  under  grant 
NSF  CCR99T9381. 


Since  we  consider  on-off  sources,  at  each  given  time  slot, 
we  assume  each  user  of  traffic  type  i  is  active  with  probability 
Pi.  The  allowable  packet  loss  rate  for  a  user  of  type  i  is  qi. 
The  packet  loss  event  for  a  user  of  type  i  is  approximated 
by  the  event  that  the  SINR  seen  by  the  packet  falls  below 
a  threshold  'yi .  For  simplicity  of  illustration,  we  employ  the 
SINR  expression  for  a  chip-synchronous  DS-CDMA  system 
with  conventional  matched  filter  reception,  so  that  the  SINR 
for  a  typical  packet  of  type  i  is  given  by 

SINRi  =  — ^ -  PtNi  - 

Eti  XikPi  +  EU  Efii  XjkPj  + 

where  \ik  is  an  on/off  indicator  (i.e.,  \ik  =  1  if  user  k  of  type 

1  is  active,  and  0  else),  and  a2  is  the  background  noise  power, 
which  indicates  the  inter-cell  interference.  The  on/off  indica¬ 
tors  {xifc }  are  assumed  to  be  independent  random  variables 
and  P[xik  =  1]  =  pi,  P{Xik  =  0]  =  1  — pi.  We  consider  the  case 
of  two  traffic  classes.  Given  K\  and  K2,  we  say  that  (K\,K2) 
is  admissible  if  the  following  conditions  axe  satisfied: 

P[SINRi  <  7i]  <  qi,  z  =  l,  2.  (1) 

Assuming  that  the  contribution  of  a  single  user’s  power  to  the 
total  transmitted  power  is  negligible,  we  can  rewrite  (1)  as 

P[X\  +  rX 2  >  ai]  <  qi,  P[Ai  +  rX2  >  a2r]  <  q2,  (2) 

where  r  =  P2/Pi  ,  a,  =  AT;/7i  -  a2 /Pi  ,  X,  =  EkU  Xh  >  *  = 
1,2.  When  the  system  size  is  large,  we  may  approximate 
Xi  +rX 2  by  a  Gaussian  random  variable  based  on  the  Central 
Limit  Theorem.  For  a  class  of  specific  scenarios  considered  in 
the  paper,  we  obtain  the  following  results: 

(a)  Given  K 1,  the  number  of  users  of  type  1,  there  is  an 
optimal  value  of  r(K\)  such  that  the  number  of  users  of  type 

2  admissible  is  maximized. 

(b)  For  a  large  system,  the  maximum  number  of  users  of 
type  2  is  approximately  a  linear  function  of  K\.  Also,  the 
power  ratio  r(K  1)  equals  approximately  a  constant  r* . 

Simulation  results  show  that,  for  large  systems,  the  Gaus¬ 
sian  approximation  provides  an  admission  region  close  to  the 
exact  admission  region,  which  is  also  well  approximated  by 
fixing  the  power  ratio  r(RTi)  as  a  constant  r*. 
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Abstract  —  Two  wavelet-based  estimators  on  frac¬ 
tional  Brownian  motion  (FBM)  are  evaluated  through 
the  large  deviation  principle  (LDP).  These  are  a]  and 
H,  the  estimators  of  (i)  the  variance  of  wavelet  coeffi¬ 
cients  of  FBM  for  each  scale  j  and  (ii)  the  Hurst  pa¬ 
rameter,  respectively,  where  H  is  obtained  from  the 
slope  of  the  linear  regression  of  <7?  for  a  number  of 
scales.  Both  estimators  are  shown  to  be  consistent 
from  the  ergodic  theorem.  We  perform  detailed  cal¬ 
culations  related  to  LDP  for  stationary  Gaussian  pro¬ 
cesses  with  unbounded  and  non-L2  power  spectrum, 
to  obtain  L1  -estimates  of  the  convergence  of  both  es¬ 
timators.  A  wavelet-based  representation  of  the  bias 
of  the  estimators  is  introduced  and  successfully  used 
in  the  theory,  reflecting  the  quantitative  analysis  re¬ 
sults  on  FBM  to  the  corresponding  analysis  of  wavelet 
coefficients. 

I.  Introduction  and  Preliminaries 

Let  the  wavelet  coefficients  {dj (fc);  j  £  Z,  k  £  No}  of  FBM 
{BH(t)',  t  €  R+}  be 

dj(k)  =  J  BH(t)il>jtk{t)dt 

where  rpj,k(t)  =  2~jf2rp{2~j  t  -  k).  Let  To  >  0  be  a  time  in¬ 
stant  up  to  which  FBM  signal  is  observed.  Then,  the  number 
TV,  (To)  €  N  of  available  wavelet  coefficients  at  scale  j  up  to 
To  satisfies  Nj  ~  2_JTo.  We  assume  that  wavelet  ip  is  com¬ 
pactly  supported  on  R^.  and  satisfies  the  vanishing  moment 
condition  of  sufficient  order. 

The  two  estimators  we  consider  are  <r2  (To)  and  Ht0  ,  defined 

by 

*’  =  4  £  \dm\ 

11  k= o 


and 

log 2  (dj)  =  ( 2HTo  +  1  )j  +  const. 

The  following  relations  are  fundamental. 

Proposition  1.  For  j  —  1,  ■  •  •  ,  J  and  k  £  No, 

dj(k)=  2{H+(1/2))i  /  BH{t  +  k)iAtjdt, 

J  R  + 

where  ==  denotes  equality  in  distribution. 

^his  work  was  supported  by  Japan  Society  for  Promotion  of 
Science,  11740057. 


For  each  fixed  s,t  £  R  +  ,  let  {Yk{s,t)\k  €  N0}  be  such  that 
Yk(s,t)  =  B„{s  +  k)-  BH(t  +  k),  k  €  N0.  {Yk(s,t);  k  £  N0} 
is  a  stationary-increment  sequence.  Let  Vh  be  variance  of 

Bh(  1). 

Proposition  2.  For  each  j, 

d]  —  E[|dj  (0) | 2] 


(i>_2(2H+uj.  \j  [  wjm- 

1  J R^.  J  R+ 


Nj-l 


—  Vh\s  —  t\2H 


dsdt. 


II.  RESULTS 

Using  the  ergodicity  of  {F*,},  we  can  obtain  the  consistency 
of  the  estimators,  in  the  form  of  L1-  and  a.s.  convergence,  as 
well  as  L2-convergence. 

The  following  results  are  our  main  theorems. 

Theorem  3.  There  exists  a  constant  Ch  >  0  such  that 

E  [  |  o’2 (To)  -  E[|rfj(0)|2]  |]'<  Ch  ■  2(2H+(3/2))j  •  T~1/2 

for  each  j . 

The  L2-estimate  for  dj{T0)  -  E[|dj (0)|2]1/2  is  then  imme¬ 
diately  obtained  from  the  above. 


Theorem  4.  There  exists  a  constant  Ch  >  0  so  that 

E[  |  Ht0  —  H  \2]  <  Ch-To1. 

Theorems  3  and  4  are  derived  from  the  following  theorem, 
which  itself  is  obtained  from  LDP  for  stationary  Gaussian  pro¬ 
cesses. 


Theorem  5.  For  each  s,<  £  R+  and  N  £  N, 

EiiiL{^(s-i)}2-^is-(i2f 


<  4y/7 x(Vh  |S  -  t\ 


2H12  .  JV~1/2 
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ABSTRACT  -  This  paper  present  Reverse  Jacket 
Transform[RJT]  and  a  simple  decomposition  of  its  matrix 
which  is  used  to  develop  a  fast  algorithm  for  the  RJT.  The 
matrix  decomposition  is  of  the  form  of  the  matrix  products 
of  Hadamard  matrices  and  successively  lower  order 
coefficient  matrices. 

I.  INTRODUCTION 

The  Hadamard  transform  is  an  orthogonal  matrix  with 
highly  practical  value  for  representing  signals  and  images 
especially  for  the  purposes  of  data  compression[l,2].  The 
reason  for  the  practicality  of  this  transform  is  the  fact  that 
the  elements  of  the  Hadamard  matrix  are  either  +  2°(=1)  or  - 
2°(=-l)  Thus,  the  computation  of  the  transform  of  a  signal 
consists  of  additions  and  subtractions  of  the  signal  samples. 
Recently,  Hadamard  matrix  has  been  presented  in  that 
Walsh-Hadamard  transform  is  the  most  known  of  the  non- 
sinusoidal  orthogonal  transforms.  Walsh-Hadamard  matrix  is 
used  for  the  Walsh  representation  of  the  data  sequences  in 
image  coding  and  for  Hadamard- Walsh  orthogonal  sequence 
generator  in  CDMA  spread  spectrum  communication.  Their 
basis  functions  are  sampled  Walsh  functions  which  can  be 
expressed  in  terms  of  the  Hadamard  [H]N  matrices.  Using  the 
orthogonality  of  Hadamard  matrices  we  construct  a 
generalized  Weighted  Hadamard  matrices  [1,2]  called  [RJ]N 
matrix  with  a  reverse  geometric  structure.  In  this  paper, 
[RJ]n  and  its  5  case  matrix  examples  are  described.  [RJ]N  is 
nonorthogonal  but  its  Hadamard  matrix,  which  is  subset  of 
[RJ]n  [1],[2]  is  orthogonal.  In  this  paper  we  propose  a  simple 
recursive  factorization  for  the  [RJ]  in  terms  of  the  Kronecker 
product  of  2*2[RJ]  and  Hadamard  matrices  of  consecutively 
lower  orders.  A  consequence  of  this  factorization  is  a  simple 
and  clear  fast  Hadamard  transform  algorithm  resulting  from 
a  block  circulant  sparse  matrix  factorization  of  the  [RJ] 
matrix. 


II.  THE  PROPOSED  RJT 

Using  the  orthogonality  of  Hadamard  matrices  use 
construct  weighted  Hadamard  matrices.  The  [RJ]N  are  a 
generalized  conventional [WH]N  and  [H]N  [1],[2],  The  [RJT] 
having  geometric  structure  property.  The  basic  idea  of  this 
paper  was  motivated  by  the  cloths  of  Reverse  Jacket.  As  our 
two  side  jacket  is  an  inside  and  outside  compatible,  at  least 
two  positions  of  a  Reverse  Jacket  matrix  [RJ]N  are  replaced 
by  their  inverse;  these  elements  are  changed  their  position 
and  are  moved  for  example  from  inside  of  the  middle  circle 
to  outside  or  from  to  inside  without  loss  of  signs;  this  is  very 
interesting  phenomenon.  This  is  the  reason  why  we  call 
it.Reverse  Jacket  matrix. 

Definition  2.1 

A  (2«  x  2n )  matrix  A=(fl  ,  n  e  N  is  called 


Hamiltonian,  if  \Aj\ -  [/(j]'  with  j 
where  /  e  R 2  <2  is  the  unit  matrix. 


0  /. 
0 


Definition  2.2 

We  define  one  more  notion  related  to  the  Hadamard  matrix 
[H ]2i  e  R2*"2*  .  Let  [P./J2,  <=  R2‘*2‘  bea  2*  x  2* 
matrix.  A  2k  x  2k  matrix  \RJ\^  such  that 

Mr  = 

is  called  the  Reverse  Jacket  matrix,  where  k  is  belong  to 
Integer  N,  R  is  Real  number.  All  its  components  is  ±  2” 
,(«=0,1,2/ 


Fig.l.  Block-wise  circular  sparse  matrix  pattern  and  Sphere 
circular  sparse  matrix  like  football 

Fig  1  shows  the  expanding  block-wise  circulant  sparse 
matrix  structure.  This  figure  is  a  plan  surface,  an  interesting 
point  is  that  the  block-wise  circulant  sparse  matrix 
characterized  similiar  fashion  as  football  and  rotating  pattern. 
This  means  that  when  2x2  sparse  matrix  is  expanded  to 
4x4  matrix,  the  element  of  2  x  2  sparse  matrix  becomes,  the 
pattern  of  Figure  1. 


III.  CONCLUSION 

The  Reverse  Jacket  matrix  is  a  generalized  the  weighted 
Hadamard  and  the  Hadamard  matrix.  The  [RJ]N  matrix  has 
recursive  structure  and  symmetric  characteristics.  The 
elements  positions  of  the  forward  matrix  can  be  replaced  by 
its  inverse  matrix  and  the  signs  of  them  are  not  changed 
between  the  matrix  and  its  inverse.  The  [RJ]N  matrix  has  five 
cases  of  basic  symmetric  matrix  according  to  the 
construction  of  elements.  The  Hadamard  matrix  is  a  special 
case  of  Reverse  Jacket  matrix.  The  fast  [RJ]N  transform 
algorithm  is  the  matrix  decomposition  of  the  Hadamard 
matrices  and  succesively  lower  order  Weighted  coeff. 
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Abstract  —  In  this  paper,  we  generalize  Sylvester’s 
construction  for  (generalized)  Hadamard  matrices  in 
such  a  way  that  m  matrices  By,  Bi, ...,  Bm  (not  neces¬ 
sarily  distinct)  of  the  same  size  k  and  a  mtrix  C  of  size 
m  are  used  as  components  to  construct  a  (generalized) 
Hadamard  matrix  of  size  mk. 


is  also  a  g2 AiA2  x  g2 AiAa  generalized  Hadamard  matrix 
GH(g,gXyXi)  overG. 

Corollary  5  If  By  =  Bi  =  •  •  •  =  Bm  in  the  construction  of 
Theorem  3,  the  resulting  generalized  Hadamard  matrix  H  is 
of  Sylvester  type- 


in  this  paper,  vre  will  prove  a  construction  for  generalized 
Hadamard  matrices.  This  construction  is  a  generalization  of 
Sylvester’s  construction.  Even  though  it  looks  obvious,  no 
such  generalization  has  been  appeared  in  a  literature  as  far  as 
both  authors  are  aware  of. 


Definition  1  Let  G  be  an  abelian  group  of  order  g  written 
additively.  For  a  positive  integer  X,  a  generalized  Hadamard 
matrix  GH(g,X)  is  a  gXxgX  matrix  [/»(*,  j)],  where  1  <  t  <  g X 
and  1  <  j  <  gX  denote  the  row  and  column  indices,  respec¬ 
tively,  such  that,  for  any  ii  ^  »2,  every  element  of  G  appears 
exactly  X  times  in  the  list  h(ii,  1)  —  h(ii,  1),  h(iy,  2)  —  h(ii,2), 
...,  h(iy,gX)  -  h(ii,gX). 

Remark  2  A  Hadamard  matrix  of  size  m  is  a  GH(2,  m/2). 


In  this  paper,  we  will  consider  only  the  generalized 
hadamard  matrices  over  an  abelian  group,  and  abelian  groups 
will  be  written  additively  with  operation  denoted  by  +. 

Theorem  3  We  assume  that  there  exists  an  m  x  m  general¬ 
ized  Hadamard  matrix  C  =  [cij]  =  GH(g,  Ai)  over  G,  where 
G  is  an  abelian  group  of  order  g  and  m  =  gXi-  We  also  as¬ 
sume  that  there  exist  B i,  Bi, ...,  Bm  which  are  (not  necessarily 
distinct)  generalized  Hadamard  matrices  GH(g,Xi)  over  G. 
Then,  the  matrix 


cu  +  By 

C12  +  Bi 

Clm  + 

C21  +  By 

Cu  +  Bi 

C2m  "t"  &m 

(i) 

.  Cml  +  B% 

Cm2  +  Bi 

Cmm  Bm 

A1A2  x  g 2 

A1A2  generalized 

Hadamard 

matrix 

GH(g,gXyXi)  over  G,  where  c  +  Bk  for  c  €  G  is  the 
matrix  obtained  by  adding  c  to  every  component  of  Bk  ■ 


Corollary  4  Using  the  same  notation  and  assumptions  of 
Theorem  3,  the  matrix 


cu  +  Bi  cu  +  B  i 
C21  +  Bi  C22  +  Bi 


Clrn  +  B 1 
C2m  +  Bi 


(2) 


L  Cml  Bm 


Cm2  “E  Bm 


Cmm  “E  B-n 


lrThis  work  was  supported  by  the  Basic  Research  Program  of  the 
Korea  Science  rind  Engineering  Foundation  (KOSEF)  under  Grant 
Number  97-0100-0501-3. 


Corollary  6  If  Bk ’s  are  the  same,  except  for  some  column 
permutations,  in  the  construction  of  Theorem  3,  then  the  re¬ 
sulting  generalized  Hadamard  matrix  H  is  of  Sylvester  type  up 
to  some  column  permutation. 

Corollary  7  Let  H  be  constructed  as  in  Theorem  3  using 
By ,  Bi, . .. ,  Bm  and  C.  Let  H'  be  constructed  as  also  in  Theo¬ 
rem  3  using  B[,B'i, ...,  B'm  and  the  same  C.  If  Bk  is  the  same 
as  B'k  except  for  some  column  permutation  for  k  =  1, 2, ...,  m, 
then  H  and  H'  are  the  same  except  for  some  column  permu¬ 
tation. 
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Abstract  —  We  solve  the  local  large  deviation 
problem  (LDP)  for  the  shape  of  random  Young 
diagram  when  lengths  and  heights  of  the  steps  of 
the  shapes  of  diagrams  take  values  form  given  (and 
in  general  different  for  length  and  height)  sets  of 
nonegative  integers 

The  shape  of  the  Young  diagrams  Yn  of  weight  n  is  the 
piecewise  constant  functions  with  integer  lenths  and  heights 
of  the  steps.  It  the  work  [l]  we  establish  the  local  LDP  for  the 
shape  of  random  Young  diagram  with  uniform  distribution. 
In  the  work  [2]  we  consider  the  case  when  lengths  (or  which 
is  equivalent  the  heights)  of  the  steps  take  values  from  given 
set  of  positive  integers  A  C  {0, 1 . . .}. 

Here  we  continue  our  investigations  and  consider  the  case 
when  lengths  of  the  steps  take  values  from  the  given  set  A 
and  heights  take  value  from  (in  general  other)  given  set  B.  To 
solve  this  problem  we  use  some  new  considerations  and  some 
methods  from  our  previous  works. 

There  exists  the  natural  maping  between  the  Young 
diagrams  of  weight  n  and  the  nonordered  decompositions  of  n 
into  the  sum  of  natural  numbers.  We  consider  the  scailing  of 
random  Young  diagrams  of  weight  n  dividing  the  linear  sizes 
of  diagrams  by  yfn.  Let’s  Kn  is  the  shape  of  the  scaled  random 
Young  diagram  which  in  turn  is  the  random  curve.  Define  the 
function  Li(z)  and  the  number  L2  by  the  following  relations 
Let’s  h 2(2:)  satisfies  the  equality 

e~iCxJ2e~ih2  =  1 

i£A  i&B 

and  in  turn  constant  C  satisfies  the  relation 
Ii2(x)dx  =  C. 

Next  we  put  L\  =  2 C  and 

L\{z)  =  zhl(z)  +  h2(z), 
where  h 1 ,  h 2  satisfy  the  equality 

teA  tee 

and  /j 1  ( s )  satisfies  the  relation 

dh2^1) 

Z~  dV  ' 

We  put  L2  =  2C. 

Next  we  consider  the  set  of  functions  C  C  L1  ([0,  00))  such 
that  for  every  y  £  C  there  exists  y  =  y  a.s.  such  that  y  is 


monotonically  nonincreasing  and  nonegative.  Also  for  every 
0  <  Xy  <  X2  <  00  the  function  y  must  satisfy  the  following 
relation 


Lr  (  )  >  -oo. 


Note,  that  from  the  definition  of  the  function  L\(z)  and  (1)  it 
follows  that  when  \A\  <  00  or  |S|  <  00,  then  y  is  continuous 
and 

min  -  <  y  <  max  -  a.s. 

<£A,]£B  J  >£A,]£B  J 

The  main  result  of  this  work  contains  in  the  following 


Theorem  1  For  the  sequence  Kn  the  following  relations  are 
valid 


lim  lim  lim 

<5  — *0  €  — ►  0  n—+  00 


In  Pn{1±f)(Kn  £  B{e.  y))  _  _ 
y/n 


where 


N(y)  — 


L2  -  /0°°  Li(-y{x))dx,  y£C, 
00,  y  £  C 


and  B(e,  y)  =  {z  £  L1([0,  00))  :  \\z  —  j/||  <  e}  is.  the  ball  in 
L1  -space.  Notation  Pn(i±<5)  means  that  we  consider  Young 
diagrams  with  weights  in  the  range  [n,(l  —  £),«(!+£)]. 
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Abstract  —  The  structure  of  perfect  binary  single¬ 
error-correcting  codes  of  length  n  =  2‘  —  1  is  investi¬ 
gated.  The  concepts  of  local  and  interweight  spectra 
of  a  code  are  introduced.  They  are  generalizations  of 
the  notion  of  the  weight  spectrum  of  a  code.  Proper¬ 
ties  of  the  spectra  of  perfect  codes  are  studied.  The 
concept  of  strong  distance-invariance  of  a  code  is  in¬ 
troduced  and  it  is  shown  that  a  perfect  binary  code 
is  strong  distance-invariant. 

I.  Definitions 

A  binary  code  C  of  length  n  is  a  subset  of  the  n-cube  (that 
is  n-dimensional  vector  space  over  GF( 2)). 

Let  x  and  y  be  vertices  of  the  n-cube.  We  denote  the  Ham¬ 
ming  weight  of  vertex  x  by  wt(x)  and  the  Hamming  distance 
between  x  and  y  by  p(x,y). 

A  code  is  distance-invariant  if  the  number  of  all  codewords 
at  distance  iG{0,l,...,n}  from  codeword  x  does  not  depend 
on  the  choice  of  the  codeword  x.  A  code  is  called  distance- 
regular  if  for  all  fixed  integers  i,j  G  {0, 1, .  . . ,  n}  the  number 
of  codewords  z  such  that  p(x,  z)  =  i,  p( y,z)  =  j  does  not 
depend  on  the  choice  of  x  and  y  but  only  depends  on  p(x,  y). 
We  call  a  code  strong  distance-invariant  if  for  every  codeword 

x  and  all  i,j,  d  6  {0, 1 . n}  the  number  of  codeword  pairs 

(y,  z)  such  that  p(x,  y)  =  i,  p(x,  z)  =  j  and  p(y,  z )  —  d  does 
not  depend  on  the  choice  of  x.  This  property  is  stronger  than 
distance-invariance  and  weaker  than  distance-regularity. 

A  fc-dimensional  face  7  of  the  n-cube  is  the  set  of  all  vertices 
of  the  n-cube  with  fixed  n  —  k  coordinates.  A  face  7X  is 
orthogonal  across  the  face  7  if  the  set  of  face  7'L  fixed  positions 
and  the  set  of  face  7  free  positions  coincide.  It  is  clear  that 
the  dimension  of  ryL  is  equal  to  n  —  k  and  the  intersection  of 
two  orthogonal  faces  consists  of  the  unique  vertex. 

Let  C  be  a  binary  code  and  z  be  a  vertex  of  the  face  7.  We 
denote  the  number  of  face  7  codewords  which  are  at  distance  i 
from  vertex  z  by  vf  (7,z).  We  call  (see  [4])  the  vector 

'  «C(7>Z)  =  (uf(7,z),7;p(7,z),. . .  ,n£m7(7,z)) 

a  local  spectrum  of  the  code  C  in  the  face  7  with  respect  to 
vertex  z  (briefly  a  (j,z)-local  spectrum  of  the  code  C ). 

We  denote  the  number  of  codeword  pairs  (x,  y)  such  that 
wt(x)  =  i,  wt( y)  =  j  and  the  distance  between  x  and  y  is 
equal  to  d  by  Tff(i,j).  We  call  the  vector 

an  ( i,j)-weight  spectrum  of  the  code  C  and  the  ordered  set  of 
(i,j)-weight  spectra  with  0  <  i,j  <nan  interweight  spectrum 
of  the  code  C. 


A  perfect  binary  single- error- correcting  code  C  (briefly  a 
perfect  code )  is.-a  subset  of  the  n-cube  such  that  a  set  of  balls 
of  the  radius  1  with  centers  in  C  is  a  partition  of  the  n-cube. 


II.  Results 

The  local  spectra  of  a  perfect  code  in  two  orthogonal  faces 
with  respect  to  their  common  vertex  were  proved  to  be  in  the 
tight  interdependence  (see  [4]). 

Theorem  1 :  Let  7  be  a  /c-dimensional  face  of  the  n-cube 
and  z  be  the  common  vertex  of  7  and  7X.  The  (71,  z)-local 
spectrum  of  a  perfect  code  C  is  uniquely  determined  by  the 
(7,z)-local  spectrum  of  the  code  and  the  generating  function 
of  the  consequence  vc('fL,z)  is 

^-j-(l  +  t)n~k  +  (1  -t^-^l+t)^'* 


B-ir  (t 

0=0  N 


"(7,Z)  “ 


n  +  1 


Establishing  the  relations  between  interweight  and  local 
spectra  of  a  perfect  code  and  using  Theorem  1  one  can  prove 
the  following 

Theorem  2:  The  interweight  spectrum  of  a  perfect  code  is 
uniquely  determined  by  the  fact  whether  the  code  contains 
all-zero  vertex  or  not. 

S.P.  Lloyd  [2],  H.S.  Shapiro  and  D.L.  Slotnik  [3]  proved  a 
perfect  binary  code  to  be  distance-invariant.  S.V.  Avgusti¬ 
novich  and  F.I.  Solov’eva  [1]  proved  that  among  the  perfect 
codes  only  Hamming  codes  of  length  3  and  7  are  distance- 
regular.  ^From  Theorem  2  we  have 

Theorem  3:  A  perfect  code  is  strong  distance-invariant. 

The  question  on  strong  distance-invariance  of  other  types 
of  codes  is  open. 
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Abstract  -  Efficient  symbol  error-correcting  codes  of 
distances  4  and  5  arc  presented. 

I.  Introduction 

Error-correcting  codes  have  been  routinely  applied  to 
modern  computer  memory  subsystems.  As  tire  capacity 
of  memory  chips  increases,  the  applications  of 
error-correcting  codes  have  been  gradually  shifted  from 
the  bit-oriented  codes  to  the  symbol-oriented  codes. 

Symbol  error-correcting  codes  of  distances  4  and  5 
have  been  investigated  by  many  researchers  [  1  -5  J .  In 
particular.  Dinner  has  constructed  several  families  of 
codes  [3|.  Feng,  et.  al.,  have  improved  Dumcr’s  results 
in  the  construction  of  codes  of  distance  5  [4],  For  a 
symbol  of  size  b,  let  q  =  2h.  Feng’s  codes  have  the 
parameters  of  code  length  n  =  qm  and  number  of  check 
symbols  r  =  T 7m/3~|  +  1,  for  odd  m. 

We  present  a  family  of  distance  4  codes  that  are 
more  efficient  than  those  in  [1-3],  In  the  case  of 
distance  5  codes,  we  construct  a  family  of  codes  with 
the  same  parameters  as  [4]  for  even  m,  thus  enlarging 
the  number  of  available  codes  for  applications. 

II.  Results 

We  employ  the  technique  illustrated  in  [5]  for  the 
construction  of  symbol  error-correcting  codes.  These 
codes  obtained  are  called  subspacc  subcodes  in  [6].  The 
basic  idea  is  to  start  with  a  linear  code,  not  necessarily 
a  Rccd-Solomon  code,  with  symbols  over  a  finite  field. 
A  new-  code  with  a  smaller  symbol  size  is  then  obtained 
by  consistently  deleting  a  fixed  set  of  bits  from  the 
symbols  of  the  original  code. 

For  distance  4,  we  start  with  a  code  Co  with  symbols 
over  GF(2m)  and  parameters  n  =  22m  +1,  and  r  =  4.  The 
first  row  of  the  parity-check  matrix  consists  of  either 
the  field  elements  zero  or  one.  We  then  construct  a  code 
C(b,c)  with  symbol  size  of  b  bits.  Let  W  -  (wi,  w2,  ..., 
Wn)  be  a  code  word  of  C(b,c),  and  let  w  be  a  binary 
vector  obtained  from  w,  by  attaching  c  zeros,  c-m-b. 
Then  W  is  a  code  word  of  C(b,c)  if  and  only  if  V  =  (vi, 
V2, ...,  vn)  is  a  code  word  of  C0:  Code  C(b,c)  is  of  length 
n  =  22(b™>  +  1  with  the  number  of  check  bits  rb  =  4b  + 
3c.  The  number  of  check  symbols  r  is  4  +  3c/b.  Let  q  = 
2b.  Then  n  =  22cq2  +1.  We  have  r  =  1  +  1.5  logq(n-l). 


The  number  of  check  bits  is  fewer  than  a  distance  4 
code  in  [4]  for  the  same  value  of  n. 

A  comparison  of  C(b,c)  with  other  known  codes  can 
be  made  when  r  is  an  integer.  Consider  b  =  3c.  We  have 
r  =  5,  n  =  q8/3  +  1.  This  is  better  than  the  codes  in  [2], 
which  have  r  =  5  and  n  =  2q2  +  2q  +  1  for  q  >  4,  in  that 
the  code  length  is  longer.  Consider  next  b  =  1.5c.  We 
have  r  =  6  and  n  =  q10/3  +  1.  This  is  also  better  than  the 
codes  in  [1],  which  have  r  =  6  and  n  =  (q  +  2)(q2  +  1). 

For  distance  5,  we  apply  the  same  technique  to  a 
code  of  [4]  with  n  =  qm  and  r  =  \ 7m/3l  +  1  to  yield  a 
new  code  with  n  =  2mcqra  and  r  =T (7m  +  3)/3~|  + 
r  7m/3]c/b.  Let  t  =  c/b.  The  new  code  has  n  =  q(t+1)m 
and  r  -(t+1)  [ 7m/3~|  +  1 .  In  particular,  let  t  =  1,  then  n 
=  q2m  and  r  =  2f7m/3]  +1.  We  have  a  family  of 
distance  5  codes  with  n  =  an  even  power  of  q. 
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Abstract  —  We  present  a  very  simple  multi-stage 
encoding  scheme  of  self-dual  codes  using  a  [8,  4,  4]  ex¬ 
tended  Hamming  code  as  short  base  code  and  bit  per¬ 
mutations  (or  interleavers  as  in  Turbo-Codes[lj)  be¬ 
tween  stages.  We  describe  several  examples  of  inter¬ 
leavers  in  order  to  build  some  extremal  codes,  with  a 
minimum  number  of  stages,  such  as  the  [24, 12,8]  Go- 
lay  code,  [32,16,8],  [64,32,12],  and  [88,44,16]  codes.  For 
length  32,  we  show  how  to  build  the  5  non-equivalent 
extremal  QR,  RM,  F,  G ,  U  [32, 16,  8]  self-dual  codes.  We 
conjecture  that  this  encoding  scheme  may  find  good 
rate- 1/2  long  block  codes  [3]. 

An  example  of  a  [24, 12, 8]  Golay  code  encoder  built  with  a 
[8, 4, 4]  extended  Hamming  self-dual  base  code  is  shown  below 


in  figure  1:  Staged  Stage!  Stage2 


no  ni 

Figure  1:  3-stage  encoding  of  the  [24,12,8]  Golay  code. 


The  two  identical  permutations  IIo  and  ni  map  the  ordered 
integers  set  (0, 1, 2, ... ,  11)  into  (0, 2, 4, 6,  8, 10, 1,3, 5,  7,9, 11). 
The  even  (resp.  odd)  bits  are  permuted  to  the  upper  (resp. 
lower)  bits.  In  general,  the  permutations  are  not  identical 
and  not  unique  for  a  given  code.  The  first  stage  0  transforms 
the  input  information  bits  vector  X  —  (xo,  Xi,  ■  ■  ■ ,  Xk-i)  = 
X(0)  =  into  "redundancy”  bits  vector 

R(°)  =  (fo°\  rj°\  . . . ,  then  the  bits  of  R1-0^  are  per¬ 

muted  or  interleaved  by  permutation  IIo  to  provide  the  input 
vector  X1^  of  stage  1,  and  so  on  until  the  last  stage  ( s  —  1) 
which  outputs  R^_1h  The  codeword  C  is  the  concatenation 
of  the  input  information  bits  vector  and  the  output  redun¬ 
dancy  vector  of  the  last  stage:  C  =  (Xt0), R{s-1)).  Table  1 
summarizes  the  codes  built  with  permutations  defined  by  the 
affine  application  ra,f,(z)  =  a*  z  +  b  (mod.  k)  with  a,b  €  Z. 
Conway  and  Pless[2]  have  shown  that  it  exists  only  5  non¬ 
equivalent  extremal  type-II  self-dual  codes  of  length  n  =  32: 
the  Q7l[32, 16,  8]  and  RM( 2,5)  codes,  and  the  codes  called 
F,  G,  and  U.  We  have  only  found  the  QR,  G  and  U  codes 
(Cf.  Table  2)  with  identical  (at  all  stages)  permutations  asso¬ 
ciated  to  the  linear  applications  in  the  multiplicative  group 
GF(16)  defined  by  z  ->  a“z  +  abz 2  4-  acz4  -f  adz8,  with 
a,b,c,d  6  0,1,...,  14  where  a  is  a  primitive  generator  of 


GF(16).  But,  we  have  built,  with  a  minimum  of  3  stages, 
these  five  [32,16,8]  codes  with  couples  of  non-identical  per¬ 
mutations  given  in  Table  3. 


(. 'Jod&^riy  k j  d m i n j 

o,  b 

Stages 

[16,  8,  4] 

1,0 

3 

Go/ay[24, 12,  8] 

5,1 

3 

G[32, 16,  8] 

3,0 

3 

[40,20,  8] 

3,0 

3 

[56,  28, 12] 

5,1 

3 

[64,32,12] 

19,0 

3 

[72,36,12] 

5,0 

3 

[88,  44, 16] 

35,0 

7 

Table  1:  Codes  obtained  with  z  — >  a*z+b  permutations. 


Gode[32, 16, 8] 

Permutation 

Stages 

QA[32,16,8] 

a1  z2  +  z4  +  az8 

3 

G[32, 16, 8] 

a'z 8 

3 

t/[32, 16, 8] 

a'z2  +  z4  +  az8 

7 

Table  2:  Length  32  extremal  codes  built  with  permu- 
taions  associated  to  linear  applications  over  GF(16). 


Code 

Permutations 

QR 

EG  =  0,4,8,12,1,5,9,13,2,6, 10, 14,3,7,11, 15 
IT  =  0, 13, 6,  7,4, 1, 10, 15,  8, 5, 14, 3, 12, 9,  2, 11 

RM 

110  =  0, 4, 8, 12, 1,  2, 9, 13, 3, 5,  6, 14,  7, 10, 11, 15 

111  =  0,5,14,3,4,13,10,7,8,9,2,15,12,1,6,11 

F 

n0  =  0, 4,  5, 8,1,2, 6, 12, 3,  7, 9, 13, 10, 11, 14, 15 
n,  =  0,1,10,11,4,13,6,15,8,9,2,7,12,5,14,3 

G 

n0  =  0,4,8,12,1,5,9,13,2,6,10,14,3,7,11,15 
EL  =  0, 13, 6, 11, 4, 1, 10, 15, 8,  5, 14,  3, 12, 9,  2, 7 

U 

110  =  0,4,8,12,1,5,9,13,2,6,10,14,3,7,11,15 

111  =  0, 13, 6, 3, 4, 1, 10, 11,  8, 5, 14,  7, 12, 9,  2, 15 

Table  3:  Permutations  of  the  5  extremal  [32, 16, 8]  codes. 
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Abstract  —  A  new  block  code  is  introduced  which  is 
capable  of  correcting  multiple  insertion,  deletion  and 
substitution  errors  present  in  a  single  block.  An  inner 
code  resilient  to  synchronisation  errors  provides  soft 
inputs  to  an  outer  code  capable  of  correcting  substitu¬ 
tion  errors.  The  decoder  does  not  require  knowledge 
of  the  block  boundaries. 

Many  coding  methods  have  been  proposed  to  cope  with 
synchronisation  errors.  Most  fall  into  one  of  two  categories, 
either  correcting  limited  synchronisation  errors  [1,  2]  or  im¬ 
posing  run- length  limiting  constraints  [3].  In  this  paper  we 
present  a  block  code  capable  of  correcting  multiple  synchro¬ 
nisation  and  substitution  errors  using  a  probabilistic  decoder. 
We  apply  the  code  to  a  model  binary  channel  with  an  input 
queue.  At  each  use,  one  of  three  events  occurs.  With  probabil¬ 
ity  Pi  a  random  bit  is  inserted  into  the  received  stream.  With 
probability  Pd  the  next  queued  bit  is  deleted.  With  probabil¬ 
ity  Pt  =  (1  —  Pd  —  Pt)  the  next  queued  bit  is  transmitted,  with 
a  probability  Ps  of  suffering  a  substitution  error. 

The  construction  of  Watermark  codes  is  outlined  in  figure  1. 
We  first  encode  our  message  m  into  a  vector  d  of  length  N  us¬ 
ing  a  standard  outer  error-correcting  code.  We  use  low-density 
parity-check  codes  defined  over  the  field  GF{q  =  2fc)  because 
they  can  easily  utilise  the  soft  information  provided  by  the  in¬ 
ner  code.  Low-density  parity-check  codes  [4]  are  currently  the 
best  known  error  correcting  codes  for  Gaussian  channels  [5,  7]. 


Fig.  1:  Watermark  codes 


For  the  inner  code,  we  choose  a  fixed  binary  vector  w  of 
length  n  x  N,  for  some  n  >  k,  which  we  call  the  watermark. 
The  watermark  is  known  to  both  encoder  and  decoder.  Suit¬ 
able  choices  for  w  include  pseudo-random  and  run-length  lim¬ 
ited  sequences.  The  encoder  maps  outer  codewords  d  to  sparse 
messages  s  of  length  |w|  by  mapping  each  g-ary  symbol  of  d 
to  one  of  the  q  sparsest  patterns  of  length  n. 

Next  we  form  the  transmitted  vector  t  :=  w  +  s  mod  2.  If 
the  message  vectors  s  are  constrained  to  be  sufficiently  sparse, 
and  synchronisation  errors  are  sufficiently  rare,  it  is  possible 
for  a  Hidden  Markov  Model  decoder  to  recover  synchronisa¬ 
tion  with  a  small  probability  of  error. 


The  inner  decoder  takes  the  noisy  received  vector  r  and 
returns  an  a  posteriori  distribution  for  each  g-ary  symbol  of 
d.  It  should  be  noted  that  the  decoder  does  not  know  the 
position  of  the  block  boundaries.  The  outer  decoder  takes  the 
output  of  the  Watermark  decoder  and  attempts  to  recover  the 
codeword  d  and  corresponding  message  m. 

Watermark  codes  can  communicate  very  effectively  over 
insertion/deletion  channels.  Figure  2  shows  rate  1/2,  block- 
length  4600,  watermark  codes  reaching  a  block  error  rate 
of  10~3  with  roughly  100  synchronisation  errors  scattered 
throughout  each  block.  This  compares  favourably  to  previous 
reports  [2]  of  rate  1/2  blocklength  15840  codes  achieving  simi¬ 
lar  block  error  rate  for  a  channel  that  made  insertion/deletion 
bursts  of  expected  length  6  on  average  once  every  9  blocks. 


Fig.  2:  Performance  of  concatenated  watermark  codes  with  over¬ 
all  rate  between  0.7  and  0.05  and  blocklengths  roughly  5000  bits. 
Outer  codes  were  regular  low-density  parity-check  codes  with  mean 
column  weights  between  2.6  and  3.  The  channel  substitution  prob¬ 
ability  Ps  was  zero.  Vertical  axis:  block  error  rate.  Horizontal  axis: 
insertion/deletion  probability.  Figure  reproduced  from  [6]. 
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Abstract  —  We  present  a  fast  algorithm  for  finding 
the  roots  of  polynomials  over  function  fields  that  can 
be  used  to  speed  up  the  list  decoding  of  algebraic 
geometric  codes. 

I.  Introduction 

Suppose  C  is  a  [n,  k,d]  code  over  a  finite  field  F,,  and 
let  t  <  n  be  a  positive  integer.  For  any  received  vector 
y  =  (j/i,  •  •  ■  ,yn)  e  F,,  we  refer  to  any  code  word  c  in  C  satis¬ 
fying  d( c,  y)  <  t  as  a  t-consistent  code  word.  Let  r  =  •  It 

is  clear  that  in  any  Hamming  sphere  in  F£  of  radius  <  r,  there 
exists  at  most  one  code  word  of  C.  On  the  other  hand,  if  t  >  r 
then  there  may  exist  several  distinct  t-consistent  code  words. 
We  call  r  the  error  correction  bound  of  the  code.  The  clas¬ 
sical  decodings  only  consider  the  decoding  algorithms  which 
can  correct  r  or  fewer  errors.  A  list  decoding  is  a  decoding  al¬ 
gorithm  which  tries  to  construct  a  list  of  all  t-consistent  code 
words,  where  t  can  be  greater  than  r.  Thus,  a  list  decoding 
algorithm  makes  it  possible  to  recover  the  information  from 
errors  beyond  the  traditional  error  correction  bound. 

In  this  paper,  motivated  by  Roth  and  Ruckenstein’s  work 

[2] ,  we  propose  an  efficient  algorithm  for  finding  the  roots  of 
polynomials  over  function  fields.  This  algorithm  can  be  used 
to  speed  up  the  list  decoding  of  algebraic  geometric  codes. 

II.  List  Decoding  for  AG  Codes 

Recently,  Shokrollahi  and  Wasserman  [3]  proposed  a  list 
decoding  algorithm  for  low-rate  algebraic  geometric  codes, 
generalizing  the  results  of  Sudan  [4]  for  Reed-Solomon  codes. 
Guruswami  and  Sudan  [1]  then  proposed  an  improved  list  de¬ 
coding  algorithm  applicable  to  high  rate  algebraic  geometric 
codes  and  Reed-Solomon  codes,  as  well.  The  list  decoding  al¬ 
gorithm  consists  of  two  main  steps.  The  first  step  is  to  find 
a  nonzero  univariate  polynomial  H(T)  over  the  function  field 
K.  of  the  curve.  This  step  can  be  reduced  to  the  solution  of  a 
system  of  homogeneous  linear  equations,  which  can  be  imple¬ 
mented  with  low  complexity  using  Gaussian  elimination.  The 
second  step  is  to  find  the  roots  of  the  polynomial  H(T)  in 
a  rational  function  space  L(G).  Shokrollahi  and  Wasserman 

[3]  and  Guruswami  and  Sudan  [1]  proposed  factorization  (or 
root-finding)  algorithms  to  find  the  roots  of  H(T).  However, 
the  implementation  of  these  algorithms  is  rather  complicated. 

In  [2],  Roth  and  Ruckenstein  presented  an  efficient  list 
decoding  algorithm  for  low-rate  Reed-Solomon  codes,  based 
upon  [4].  They  reduced  the  complexity  of  the  second  step,  the 
codeword  reconstruction,  by  means  of  an  efficient  algorithm 
for  finding  roots  of  univariate  polynomials  over  polynomial 
rings. 

1This  work  was  supported  in  part  by  Grant  No.  NCR-9612802 
from  the  NSF  and  by  a  research  grant  from  the  National  Storage 
Industry  Consortium. 
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III.  Fast  Algorithm 

We  have  extended  the  efficient  root-finder  in  Roth  and 
Ruckenstein’s  reconstruction  algorithm  to  the  class  of  uni¬ 
variate  polynomials  over  the  function  field  of  any  curve  in 
m-dimensional  projective  space.  Using  this  extension,  we  ob¬ 
tain  an  efficient  list  decoding  algorithm  for  algebraic  geometric 
codes. 

Let  H{X;T)  =  h0(X)  +  hi(X)T  +  •  •  •  +  MX)T5,  where 
hj(X)  €  L((l  -  jp)P).  Suppose  f(X)  S  L(G)  =  L(pP)  such 
that  H(X\f(X))  —  0.  Let  {p\,p2,  -  ■  ■  ,pk)  be  a  basis  of 
L(pP),  such  that  pi  has  a  pole  only  at  P  and  the  order  of  the 
pole  is  pi,  i.e.,  ordp(</>i)  =  —pi,  where  Pi  is  the  i-th  nongap  at 

P.  We  can  assume /(A)  =fipi(X)+f2<P2(X)-\ - \-fk<Pk(X), 

where  fi  €  F,.  We  now  can  find  /*,  fk-i,  •  •  •,  /i,  by  the  fol¬ 
lowing  procedure. 

Set  Gi(X\T)  =  Hi(X;T)  =  H(X,T)  and  Gi(X;T)  = 
Gi(X;  pkT).  Then,  Gi(X;T)  =  Let  -pn  = 

min{ordp(/ij<p£)  |  j  =  0,  l,  -,s}.  Suppose  pri  is  a  ratio¬ 
nal  function  with  ordp(pn)  =  — pr Divide  G\{X\T)  by 
tpri,  and  let  Gi{X-,T)=  Gi{X\T)/pri-  Then,  GX{P\T )  is  a 
nonzero  polynimial  in  F,[T]. 

On  the  other  hand,  by  H{X\  f{X))  =0,  we  have 

5'ix'^m)=0'  (1) 

Since  ordp  =  Pk  —  Pj  >  0,  for  j  —  1,  •  •  • ,  k  —  1,  we  have 

;£•( P )  =  fk ■  By  (1),  we  have  Gi(P;fk )  =  0.  So  by  solving 
Gi(F;T)  =  0,  we  can  get  /*. 

This  derivation  can  be  applied  inductively  to  determine  the 
remaining  coefficients  fk- i  , . . . ,  /i  of  a  root  f(X)  of  H(T)  =  0. 

Theorem:  Let  H(T)  be  a  nonzero  polynomial  of  degree  s 
in  K.[T}  that  is  returned  in  the  first  step  of  the  list  decoding 
algorithm.  Then  the  roots  of  H(T)  in  L(G)  =  L(pP)  can  be 
determined  by  the  root-finding  procedure  described  above.  This 
root-finding  procedure  requires  0(ks(n2  T  s2  +  log2s  ■  loglogs  • 
logq))  operations  over  Fq  and  0(ks2)  operations  over  K.. 
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I.  Introduction 

Recently  Sudan  [1]  presented  a  decoding  method  of  RS 
codes  beyond  the  error-correction  bound  t,  i.e.  the  half  of 
the  minimum  distance.  This  is  a  list  decoding  which  gives 
as  its  output  a  list  of  all  codewords  within  a  specified  dis¬ 
tance  t!  from  the  received  word,  where  t'  >  t.  On  one  hand 
this  method  can  be  generalized  to  a  list  decoding  of  one-point 
algebraic-geometric  (AG)  codes  [2],  and  on  the  other  hand 
their  improved  versions  for  larger  rate  have  been  given  [3].  Su¬ 
dan  algorithm  is  composed  of  two  major  stages.  At  the  first 
stage,  in  case  of  RS  codes,  one  must  find  a  bivariate  polyno¬ 
mial  f(x,y)  having  a  prescribed  set  of  zeros  and  a  minimal 
(multi-)degree  with  respect  to  a  certain  admissible  total  order 
over  the  set  of  integer  vectors  Zq  so  that  its  factors  of  form 
y  —  g(x)  give  the  codewords  in  the  required  list  at  the  second 
and  last  stage  of  factorization  of  f(x,y).  Though  the  fac¬ 
torization  problem  has  more  complexity  in  Sudan  algorithm, 
the  interpolation  problem  is  significant  by  its  own  nature  and 
important  as  well. 

In  a  general  interpolation  problem  in  the  bivariate  polyno¬ 
mial  ring  K[x,y\  over  a  finite  field  K ,  one  is  required  to  find  a 
set  of  polynomials  /( x,  y)  subject  to  the  condition:  /(a,,  /3,)  = 
7i,  1  <  i  <  n,  for  a  given  set  {(ai,/3i,7i)|l  <  i  <  n}  C  K3 
along  with  some  additional  constraints  or  prerequisites.  Since 
this  problem  is  equivalent  to  a  nonhomogeneous  system  of  lin¬ 
ear  equations  for  unknown  coefficients  of  polynomials  f(x,y), 
its  generic  solution  is  given  as  a  sum  of  a  single  or  special  solu¬ 
tion  of  the  nonhomogeneous  system  and  the  generic  solution 
of  the  homogeneous  system  corresponding  to  the  condition: 
f{oii,(3i)  =  0,  1  <  i  <  n.  Thus,  the  latter  type  of  interpola¬ 
tion  problem,  where  one  is  required  to  find  polynomials  with 
preassigned  zeros,  is  substantial.  Furthermore,  the  generic  so¬ 
lution  of  this  homogeneous  system  has  a  mathematically  clear 
meaning  as  follows: 

For  a  finite  subset  V  :=  {(ai,/3i)|l  <  i  <  n}  C  K2,  the  set 
of  polynomials  I(V)  :=  {f(x,y)  €  K[x,y] |/(aq, /?,)  =  0,  1  < 
i  <  n}  is  an  ideal  of  the  ring  K[x.  y]  and  any  element  of  I(V ) 
having  a  minimal  degree  can  be  obtained  among  a  Grobner 
basis  of  the  ideal  I(V)  with  respect  to  the  specific  total  order. 
Though  some  other  efficient  algorithms  to  solve  this  interpo¬ 
lation  problem  have  been  given  [4]  [5],  they  miss  the  above 
crucial  observation  so  that  each  of  them  is  of  its  own  special 
and  separate  form. 

In  this  paper,  we  present  an  efficient  algorithm  to  find  a 
Grobner  basis  of  the  ideal  I(V)  based  on  Berlekamp-Massey- 
Sakata  (BMS)  algorithm  [6]  [7],  which  gives  another  efficient 
method  of  giving  the  solution  at  the  first  stage  of  Sudan  al¬ 
gorithm.  Furthermore,  we  show  that  the  above  interpola¬ 


tion  problem  can  be  generalized  to  find  a  Grobner  basis  of 
the  ideal  I(V;M)  which  consists  of  polynomials  having  ze¬ 
ros  (cq,/3,)  6  V  with  some  multiplicity  condition  specified  by 
a  set  M  (g  Z2)  of  integer  vectors.  This  Hermitian  type  of 
interpolation  problem  takes  a  role  in  the  improved  version  of 
Sudan  algorithm  [3].  A  modification  [8]  of  BMS  algorithm  can 
be  applied  to  solve  this  problem.  On  the  other  hand,  for  list 
decoding  of  one-point  AG  codes  our  method  can  be  adapted 
to  find  a  Grobner  basis  of  a  relevant  ideal. 

II.  Conclusion 

BMS  algorithm  can  be  applied  efficiently  not  only  for 
the  conventional  bounded-distance  decoding  of  one-point  AG 
codes  up  to  the  Feng-Rao  designed  distance  but  also  for  list 
decoding  of  RS  codes  and  one-point  AG  codes.  As  a  result  of  a 
simple  analysis  of  computational  complexity  in  the  improved 
version  of  Sudan  list  decoding  of  RS  codes  for  the  number 
t'  of  correctable  errors,  where  t'  ~  n  —  Vhkn  for  a  constant 
h  >  1,  we  have  the  following  estimate.  Based  on  our  method, 
the  first  interpolation  stage  has  complexity  0(k~^n^mz)  (in 
comparison  with  0(n3m6)  based  on  Gaussian  elimination), 
where  m  is  the  required  multiplicity  of  zeros  in  the  improved 
Sudan  algorithm. 
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Abstract  —  We  derive  upper  bounds  on  the  num¬ 
ber  of  errors  that  can  be  corrected  by  list  decoding  of 
MDS  codes  using  small  lists.  We  show  that  the  perfor¬ 
mance  of  Reed-Solomon  codes,  for  certain  parameter 
values,  is  limited  by  worst-case  codeword  configura¬ 
tions,  but  that  with  randomly  chosen  codes  over  large 
alphabets,  more  errors  can  be  corrected. 

I.  Introduction 

In  most  cases,  error  patterns  with  slightly  more  than  y 
errors  can  be  corrected  by  an  ( N,K )  maximum  distance 
separable  (MDS)  code.  However,  until  the  publication  of 
Sudan’s  algorithm  [1],  the  complexity  of  the  computation 
increased  quickly  with  the  number  of  additional  errors,  and 
in  practice  only  a  single  additional  error  could  be  corrected 
with  an  acceptable  amount  of  computation.  Sudan’s  paper 
not  only  gives  an  algorithm  which  allows  more  errors  to  be 
corrected,  it  also  gives  a  proof  that  for  a  limited  number  of 
errors,  the  correct  codeword  is  always  on  a  very  small  list  of 
possible  transmitted  words.  In  [2]  the  algorithm  was  extended 
to  all  rates  and  they  obtained  a  bound  for  list  of  j  decoding 
consisting  of  a  sequence  of  straight  lines  meeting  at  the  points 
(r,<)  =  (s(s-l)/j(j  +  l),l-a/0' +  !)),«  =  l,2,...,j  +  l 
where  r  is  the  rate  of  the  code  and  t  is  the  fractional  number 
of  errors  that  can  be  corrected.  For  large  j,  the  fractional 
number  of  errors  approaches  1  —  y/f. 

The  aim  of  this  paper  is  to  study  codeword  configurations 
in  order  to  derive  bounds  on  the  number  of  errors  that  can 
be  corrected  with  any  list  decoding  algorithm  when  the  size 
of  the  list  is  some  fixed  number.  The  bounds  coincide  with 
Sudan’s  bounds  indicating  that  the  limitation  is  not  related 
to  any  particular  decoding  algorithm. 

II.  A  GENERAL  OBSERVATION 

Suppose  we  have  j  +  1  codewords 

Ui  =  (un,Ui2  ■  ■  •  ,UiN)  ,  i  =  1,  2, . . . ,  j  +  1 

at  mutual  distance  D  with  the  property  that  in  any  coordinate 
the  j  +  1  words  have  the  same  symbol  in  s  words  and  different 
symbols  in  the  remaining  j  +  1  —  s  words. 

From  such  a  configuration  we  can  get  a  balanced  incomplete 
block  design  by  taking  the  j  +  1  codewords  as  points  and  let 
the  N  blocks  be  the  codewords  that  have  the  same  symbol  in 
the  first,  the  second,. . . ,  the  N  th.  coordinate. 

The  parameters  of  the  design  sue  v  =  j  +  1,  k  =  s,  \  =  N  - 
D,  and  b  —  N. 

Necessary  conditions  for  the  existence  of  such  a  design  are 


(s  -  1) 

1  (A  -  D)j 

(1) 

s(s  -  1) 

|  (N-D)j(j  + 1) 

(2) 

(N  -  D)j(j  +  1) 

=  Ns(s-  1) 

(3) 

Tom  Hpholdt 
Department  of  Mathematics 
Technical  University  of  Denmark, 

Bldg. 303 

DK-2800  Lyngby,  Denmark 
e-mail:  T.Hoeholdt@mat.dtu.dk 

If  such  a  configuration  of  codewords  exists  and  we  let  w  be  the 
word  that  has  the  shared  symbol  as  its  j  th  coordinate,  the 
distance  between  w  and  all  the  codewords  is  T  =  N  —  -yy  and 
the  code  has  rate  jv~^)+l  =  +  jj  and  t  -  ~  =  l  -  yyy. 

If  we  let  A  =  2~')  and  N  =  the  existence  of  such  a 

block  design  (for  sufficiently  large  /)  follows  from  a  theorem  of 
Wilson  [3].  In  the  special  case  where  l  —  1  and  j  +  1  is  rela¬ 
tively  prime  to  2,  3, . . . ,  s  —  1  there  is  a  nice  direct  construction 

W- 

This  gives  the  following  theorem 

Theorem  1  Let  j  and  s  be  natural  numbers  s  <  j  4-1.  Then 
there  exists  a  natural  number  m  such  that  for  l  >  m  the  Ham¬ 
ming  space  Fq  where  N  =  ^  ^2+1d  contains  j  +  1  vectors  of 
mutual  distance  D  =  N—  0  in  a  sphere  of  radius  N-  jyy. 

Remark  1  One  can  prove  that  if  N  —  'Art U )  then  the  small¬ 
est  radius  of  a  sphere  containing  j  +  1  words  of  mutual  distance 
at  least  N  -  is  N  - 

It  turns  out  that  using  classical  designs  from  points  and  hy¬ 
perplanes  of  PG(m,q)  one  can  actually  get  the  corresponding 
R-S  codewords.  This  leads  to  the  following: 

Theorem  2  Suppose  j  4-  1  =  ,  where  q  is  a  prime  power 

an  m  is  a  natural  number.  Then  there  exist  Reed-Solomon 

m  —  2  i  _ r\  m-2/._  i\2 

codes  of  rates  1  —  and  - — "  suc h  ™at  we  can  only 

be  sure  to  have  the  correct  codeword  on  a  list  of  size  j  if  the 
fractional  number  of  errors  t  satisfy  t  <  1  —  and  t  < 

1  —  q  respectively. 

This  gives  t  <  1  —  y/r  for  large  j. 

III.  An  UPPER  BOUND  FOR  RANDOMLY  CHOSEN  CODES 
We  have  also  studied  list  decoding  of  randomly  chosen 
codes  and  could  prove  the  following 

Theorem  3  There  exists  (q  —  l,k)  codes  over  F,  such  that 
list  of  j  decoding  is  possible  for  any  t  <  (1  —  r)j/(j  +  1). 

In  a  particular  case  we  have  that  if  m  is  at  least  10  ,  there 
are  codes  over  F2m  of  rate  1/2  that  allow  more  than  N/ 4  errors 
to  be  corrected  with  list  of  2  decoding. 
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Abstract  Vector  symbol  decoding  is  an  outer  code  decoding  technique  for  a  concatenated  code 
which  works  with  a  (n-k)*r  syndrome  matrix  S  of  n-k  linear  combinations  of  r-bit  inner  code  symbol 
vectors.  A  Gauss-Jordan  reduction  provides  an  error  location  vector.  The  zeroes  in  the  error  location 
vector  are  the  apparent  error  positions,  and  the  number  of  zeroes  should  match  the  rank  of  S.  Then 
the  error  values  can  be  solved  for.  Decoding  success  is  related  to  the  linear  independence  of  error 
vectors.  Data  scrambling  techniques  for  inner  code  symbols  can  make  linear  independence  likely  for 
moderately  large  r.  .Any  parity  check  code  structure  can  be  used  for  the  combination  rules. 

The  new'  result  is  that,  if  the  outer  code  decoder  has  available  a  (possibly  ordered)  list  of  tw'o 
or  more  alternative  decisions  for  some  or  all  inner  symbols,  a  slightly  modified  vector  symbol 
decoder  automatically  reveals  most  correct  alternatives,  allowing  more  powerful  and  often  simpler 
correction.  Moreover,  the  ability  to  recognize  these  alternatives  does  not  require  error  vector  linear 
independence. 

The  main  idea  is  to  store  differences  between  alternative  choices  and  the  first  choice  as 
additional  row's  below  the  syndrome  matrix  S.  Let  x  =  first  choice,  y  =  second  choice.  If  y  is  correct, 
the  stored  x  -  y  =  e,  the  true  error.  When  column  operations  are  done  on  the  augmented  matrix  to 
transform  the  n-k  rows  of  S,  if  y  is  correct,  e  will  almost  always  be  directly  revealed  as  a  member  of 
the  row  space  of  S.  .Also,  its  position  is  known  by  construction.  A  simple  theorem  shows  that  e  is 
revealed  whenever  the  first-choice  error  positions  do  not  completely  cover  any  code  word  of  the 
combination  code  (because  then  all  the  error  vectors  will  be  in  the  row  space  of  S,  even  if  the  error 
vectors  are  linearly  dependent).  It  is  found  that  the  probability  that  the  error  vectors  are  not  all  in 
the  row  space  of  S  is  about  four  orders  of  magnitude  lower  than  the  decoder  error  probability  for 
an  equal -rate  maximum  distance  nonbinary  outer  code  working  with  a  0.01  -  0.1  vector  symbol  error 
probability  range,  for  cases  of  a  (15,4)  binary  combination  code  and  a  (23,  12)  Golay  binary 
combination  code.  Even  for  a  randomly-chosen  (255,  223)  binary  combination  code,  the  probability 
the  errors  are  not  all  in  the  row  space  of  S  is  at  least  two  orders  of  magnitude  low'er  than  the  decoder 
failure  probability  of  a  (255,223)  Reed-Solomon  code  correcting  up  to  the  guaranteed  error 
probability,  in  the  symbol  error  probability  range  0.02-0.06.  For  conditional  second  choice  error 
probability  less  than  about  0.3  and  large  r,  the  decoder  failure  probability  closely  approximates  the 
probability  that  the  first-choice  error  positions  cover  some  code  word,  in  the  ranges  stated. 

If  p  is  the  rank  of  S,  there  is  a  probability  of  about  2"(r  p)  that  x  -  y  is  in  the  row  space  of  S 
for  a  false  y.  However,  the  error  location  vector  acts  as  added  verification  whether  the  position  of 
y  is  one  of  the  apparent  error  locations.  .Another  theorem  shows  that  the  number  of  false  apparent 
error  locations  in  the  error  location  vector  can  not  exceed  the  number  of  combination  code  w'ords 
covered  by  the  error  vectors  in  at  least  all  but  one  position.  This  number  would  usually  be  zero,  and 
rarely  >  1. 
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Abstract  — 

We  present  a  method  for  selectable  delay  turbo  coding  that, 
using  a  common  serially-concatenated  encoding  operation  with 
interleavers  of  growing  lengths,  allows  decoders  having  differ¬ 
ing  latency  constraints  to  achieve  decoding  power  commensurate 
with  that  delay. 

I.  Introduction 

Concatenated  encoding  with  soft,  iterative  decoding  is  known  to 
be  capable  of  achieving  operation  near  the  Shannon  bound  on  the 
AWGN  and  other  channels,  provided  the  interleaving  length  is  suf¬ 
ficiently  large  (see  for  example  [1]).  Other  than  decoder  computa¬ 
tional  complexity  attached  to  the  APP  algorithm  and  subsequent  it¬ 
erative  decoding  passes,  the  primary  negative  aspect  of  turbo  codes 
is  latency  (delay)  associated  with  the  interleaving  and  deinterleaving 
mappings  in  the  decoder.  It  is  now  known  [2],  [3],  that  concatenated 
systems  with  iterative  decoders  perform  nearly  as  well  as  any  code  for 
a  given  specified  latency.  Still,  latency  is  a  sensitive  issue  for  many 
applications,  e.g.  two-way  interactive  voice  traffic.  Also,  it  is  easy 
to  envision  applications  whose  latency  allowances  vary  widely  from 
frame  to  frame.  We  thus  propose  a  system  which  admits  decoder 
choice  in  latency,  while  sharing  a  common  transmission  framework. 
We  suggest  this  has  applications  in  multimedia  data  transmission  ap¬ 
plications,  or  in  allowing  performance/delay  tradeoffs. 

II.  System  Description 

We  will  illustrate  the  concept  with  the  three-level  serial  concate¬ 
nation  scheme  shown  in  Figure  1.  A  message  u  with  length  N  enters 
the  encoder,  and  is  systematically  encoded  by  a  2-state  convolutional 
encoder  with  feedback.  The  sequence  p1  is  formed  as  the  running 
sum  of  the  input  to  the  encoder,  i.e.  pnl  =  wi-  T°  maintain 


Figure  1 :  Diagram  of  three-level  encoder 

high  rate,  we  puncture  much  of  the  parity  stream,  and  for  discussion 
here,  preserve  every  seventh  parity  bit  for  transmission.  Hence  the 
first  component  is  a  rate  7/8,  two-state  convolutional  encoder.  The 
multiplexed  bit  stream  is  designated  v  and  has  length  (8/7 )N.  As 
with  standard  SCCC  systems,  the  sequence  v  is  bit-interleaved  into 

'This  work  was  supported  by  NSF  grant  NCR-9714646 


a  permuted  sequence  w,  using  an  interleaver.  Here  we  suggest  inter¬ 
leaver  size  D i  in  the  range  of  a  few  hundred  bits,  ultimately  allowing 
a  modest  delay  iterative  decoding. 

The  pseudo-message  w  is  presented  to  another  encoder,  also  here 
having  two  states  and  rate  7/8.  The  output  parity  sequence  p2  is 
again  punctured  and  multiplexed  with  the  input  sequence,  producing 
a  sequence  x  having  length  (8/7 )2 N.  Finally,  x  is  interleaved  to  a 
sequence  y  with  an  interleaver  having  size  D 2,  perhaps  4000  bits. 
The  sequence  y  is  encoded  as  above,  producing  a  final  sequence  z, 
whose  length  is  (8/7 )3iV,  so  that  the  overall  encoding  rate  is  (7/8)3. 

Instead  of  transmitting  the  sequence  z  we  propose  a  reordering 
so  that  selectable  latency  options  exist.  Specifically,  we  form  the  se¬ 
quence  b  as  shown  in  Figure  1,  obtained  by  first  multiplexing  v  with 
the  punctured  version  of  p2  to  produce  a.  Note  that  this  sequence 
retains  the  proper  ordering  of  the  v  sequence,  so  that  ’’zero-delay” 
decoding  is  possible.  Then,  b  is  formed  by  multiplexing  the  punc¬ 
tured  parity  sequence  p3  into  a. 

Depending  on  the  latency  allowed,  three  decoding  architectures 
are  possible.  First,  a  simple  Viterbi  decoding  of  the  sequence  v,  with 
essentially  zero  delay,  is  possible  merely  by  deleting  unneessary  par¬ 
ity  bits  from  the  received  stream.  A  moderate  delay  iterative  decoder 
using  two  SISO  modules  can  be  built  to  process  the  noisy  reassem¬ 
bled  version  of  x.  This  decoder’s  latency  is  proportional  to  D\ .  Fi¬ 
nally  a  full  three-SISO  decoder  can  be  fashioned  to  exploit  the  entire 
received  stream.  Its  latency  is  proportional  to  D2. 

III.  Results 

We  have  simulated  the  performance  of  the  three  decoding  options 
in  conjunction  with  BPSK  transmission  on  a  Gaussian  noise  channel. 
To  illustrate,  for  a  message  size  4900  bits,  encoded  with  overall  rate 
(7/8)3  =  0.67,  and  interleaver  sizes  Di  =  280  and  D2  —  6400,  we 
find  that  ’’zero  delay”  decoding  achieves  Pi ,  =  10~4  at  Eb/No  =  8 
dB,  consistent  with  high-rate,  low  complexity,  and  penalty  of  dis¬ 
carded  parity.  With  two-SISO  decoding  (moderate  delay),  we  achieve 
a  coding  improvement  of  about  3  dB  over  the  zero-delay  option  at 
Pb  —  10~5.  Finally,  the  three-level  iterative  decoder  achieves  error 
probability  10-5  at  Eb/No  =  2.7  dB.  This  is  about  1.7  dB  short  of 
channel  capacity  for  binary  PSK  when  R=0.67  signaling  is  employed. 
In  summary,  progressively  more  energy  efficiency  is  gained  as  the  la¬ 
tency  constraint  is  relaxed  and  more  decoder  levels  are  employed.  In 
addition  to  the  gain  implied  by  larger  latency,  we  obtain  extra  value 
from  not  discarding  parity  symbols.  Each  of  these  performances  is 
attainable  with  a  common  transmitting  framework. 
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Abstract  —  An  optimal  MAP-equivalent  SOVA  de¬ 
coding  algorithm  and  its  simplified  suboptimal  algo¬ 
rithm  for  non-binary  codes  are  proposed.  The  im¬ 
plementation  of  its  suboptimal  algorithm  is  simpler, 
while  its  performance  is  very  close  to  the  optimal 
Log-MAP  algorithm.  The  proposed  SOVA  can  be 
used  as  a  decoder  for  Turbo  trellis  coded  modulation 
(TTCM).  It  is  concluded  that  the  proposed  SOVA 
performs  very  close  to  the  Log-MAP  algorithm  for 
both  the  TTCM  and  binary  Turbo  codes,  and  its  per¬ 
formance  is  better  than  the  conventional  SOVA. 


I.  Introduction 

The  MAP  and  Log-MAP  algorithms  are  optimal  in  the  sense 
of  maximum  a  posteriori  sequence  probability.  The  simplified 
versions  of  MAP  or  Log-MAP,  such  as  Max-Log-MAP,  SOVA, 
are  suboptimal.  The  optimality  are  traded  for  simplification 
of  implementation.  The  SOVA  is  roughly  half  as  complex  as 
the  Log-MAP  [2]  with  some  performance  degradation.  As  for 
non-binary  codes,  the  complexity  of  MAP  algorithm  becomes 
overwhelming  and  the  simplified  suboptimal  algorithms  are 
badly  needed,  with  desirable  small  performance  degradation. 
In  this  paper,  a  different  implementation  of  MAP  algorithm  is 
proposed,  together  with  a  suboptimal  algorithm.  It  is  shown 
the  performance  of  the  suboptimal  algorithm  for  non-binary 
Turbo  codes  is  near  to  the  performance  of  optimal  MAP. 


II.  MAP  EQUIVALENT  SOVA 

One  elegant  derivation  of  the  MAP  algorithm  is  presented  in 

[1]  by  splitting  the  joint  probability  p(s' ,  s ,  y),  where  s'  -¥  s  is 
the  state  transition  at  some  epoch  k,  and  y  is  the  received  se¬ 
quence.  Another  way  to  derive  the  joint  probability  p(s',  s,  y) 
is 

P(s,  s,  y)  =  P(s'|s,  y)P(s|y)p(y).  (1) 

The  probability  of  s'  given  s  and  the  received  y  is 


*Vl«.y) - 


p(s {*  ~*s\yj<k) 

E.»-p(  Sis"^s\yj<k)’ 


(2) 


where  sjj.J  _>s)  is  the  trellis  path  containing  branch  (s'  — >  s), 
and  s ^  is  the  trellis  path  containing  branch  (s"  -A  s)  at 
epoch  k.  For  each  state  s  at  k,  the  number  of  trellis  branches 
terminated  at  s  is  equal  to  M  for  M- ary  codes.  There  are 
M  possible  states  s"  for  state  transitions  s"  ->  s.  The  ra¬ 
tio  of  probability  of  a  trellis  path  containing  branch  (s'  s) 
to  the  sum  of  probabilities  of  all  possible  trellis  paths  ter¬ 
minated  at  s  is  the  probability  of  state  s'  given  s  and  y, 

xThis  research  was  supported  by  the  National  Science  Founda¬ 
tion  under  grant  CCR-9903297. 


i.e.,  P(s'|s,y).  From  the  VA,  the  path  metric  of  path  s*,  is 
Mk( Sk)  =  log  fp( sk,yk))  ■  Then  we  have, 


T(s'|s,y) 


exp  (. 

1 

E S"  exP 

(Mk(  sr->)) 

(3) 


where  the  received  symbol  sequence  y j>k  is  independent  of 
the  state  transition  at  epoch  k.  The  conditional  probability 
P(s|y)  can  be  yielded  through  backward  recursion,  as 


P(»'  |y)  =  5]P(*'ky)P(«|y).  (4) 


The  initial  value  of  the  backward  recursion  is  P(s|y)  =  1  for 
terminal  state  s  at  epoch  k  —  N,  and  P(s|y)  =  0  otherwise. 

The  soft  output  should  have  M  possible  values  for  Af-ary 
source  symbols.  The  soft  output  value  of  the  proposed  algo¬ 
rithm  in  probability  form  is  obtained  as, 

P(iik  =  n|y)  =  P(s',s,y)/p(y) 

(•'-♦•) 

uk=a 

exp  (Mk(s[s'-*s))'\ 

=  V  P(s|y) - - 4—. 

(g.)  E^exP(M,(sf-s))) 

(5) 

Using  the  joint  probability  in  (1)  with  the  a  posteriori  prob¬ 
ability  definition,  we  derived  the  implementation  of  MAP  al¬ 
gorithm.  In  our  discussion,  the  path  metrics  are  computed 
with  VA.  (5)  is  the  a  posteriori  probability,  or  the  soft  out¬ 
put  definition  of  VA.  So  we  still  can  call  the  proposed  MAP 
implementation  as  a  SOVA.  However,  the  proposed  SOVA  is 
different  from  the  SOVA  in  literature  [1,  3,  4],  The  proposed 
SOVA  is  an  optimal  MAP  algorithm,  while  the  conventional 
SOVA  is  a  Max-Log-MAP  equivalent,  which  is  sub-optimal. 

Similar  to  the  MAP  algorithm,  the  proposed  SOVA  soft 
output  in  (5)  can  be  simplified  by  passing  to  the  log-domain 
of  probability. 
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Abstract  —  Geometric  interpretation  of  turbo¬ 
decoding  has  founded  an  analytical  basis,  and  pro¬ 
vided  tools  for  the  analysis  of  this  algorithm.  Based 
on  this  geometric  framework,  we  extend  the  analyti¬ 
cal  results  for  turbo-decoding  of  product  codes,  and 
show  how  analysis  tools  can  be  practically  adopted 
for  this  case.  Specifically,  we  investigate  the  algo¬ 
rithm’s  stability  and  its  convergence  rate.  We  present 
new  results  concerning  the  structure  and  properties 
of  stability  matrices  of  the  algorithm,  and  develop  up¬ 
per  bounds  on  the  algorithm’s  convergence  rate.  We 
prove  that  for  any  2x2  (information  bits)  product 
codes,  there  is  a  unique  and  stable  fixed  point.  For  the 
general  case,  we  present  sufficient  conditions  for  sta¬ 
bility.  The  interpretation  of  these  conditions  provides 
an  insight  to  the  behavior  of  the  decoding  algorithm. 

I.  Introduction 

Turbo  codes,  first  introduced  in  1993  [1],  are  considered  as 
one  of  the  most  important  developments  in  coding  theory  in 
recent  years.  Although  simulation  and  practical  results  gener¬ 
ally  show  excellent  performance,  there  is  a  lack  of  theoretical 
basis  for  explaining  the  results  and  providing  tools  for  their 
analysis.  Recently,  a  new  approach  [2]  of  geometric  inter¬ 
pretation  to  the  decoding  algorithm  has  managed  to  reveal 
interesting  features  of  the  decoding  process.  Based  on  it,  we 
extend  the  analytical  results,  and  use  simulations  to  gain  a 
deeper  understanding  of  the  turbo-decoding  of  product  codes. 

II.  Product  Codes  Turbo-Decoding 

A  product  code  (without  checks  on  checks)  turbo  encoder 
uses  block  encoders,  and  a  rows  to  columns  interleaver.  The 
information  bits  are  arranged  in  kr  rows  and  kc  columns.  The 
i-th  row  (xr' )  enters  a  ( ny,kc,dr )  block  encoder  and  forms  a 
row  code  word  y' .  The  i-th  column  (xCi)  enters  a  ( nz ,  kr,dc) 
block  encoder  and  forms  a  column  code  word  z'  (where  dT  and 
dc  are  the  minimal  distances  of  the  row  and  column  codes,  re¬ 
spectively)  . 

Let  Px,Py  and  Pz  represent  the  log-densities  corresponding 
to  the  posterior  densities  p(x\x),p(y\x)  and  p(z\x),  respec¬ 
tively.  Let  Qy ,  Qz  denote  the  extrinsic  information  from  the 
rows  and  columns  decoders,  respectively.  In  [2]  it  is  shown 
that  the  stability  of  the  decoding  algorithm  is  determined  by 
the  stability  of  S: 

S  =  SRSC  =  ( Jpx+pv+Qz  -  I)(Jpx+Qv+Pz  -  A  (*) 

and  the  general  expression  for  the  Jacobian  matrix  is  given. 
Using  the  independence  of  the  decoding  of  different  rows  (or 
columns),  we  develop  an  explicit  expression  for  Jp.  E.g.  for 
the  Jacobian  of  the  rows  decoding  -  ( JR)i,j  we  get  (P  =  Px  + 


Py  +  Qz)- 

(  ep(xj  =  l|ii  =  1)  —  ep(xj  -  \\xi  -  0)  Xi,xj  €.  xTa 
1  0  Xi  £  xTa  ,Xj  €  xTb 

(2) 

The  brute-force  calculation  complexity  of  a  JR  element  is 
o(2kc~1),  also,  note  that  it  is  a  diagonal  block  matrix,  whose 
i-th  block  ( JR  i )  is  the  Jacobian  matrix  of  the  i-th  row  de¬ 
coding. 

We  show  that  for  general  values  of  kr  and  kc  ,  S  is  a  block 
matrix,  where  each  block  ( S' ’■')  is'  a  kc  x  kc  matrix,  with  an 
all  zeros  diagonal.  The  main  diagonal  of  S  is  the  zero  matrix 
(SM  =  0).  For  kT  —  kc  =  2  we  get: 


S  = 


a2,i&i,3 


01,2^2,4  \ 


\  «4, 3^3,1 


<13,4^4,: 


,  (3) 


where  =  ( SR)i,j  and  =  (Sc)i,j. 

Theorem  1:  The  fixed  point  of  any  product  code  turbo¬ 
decoder  with  kr  —  kc  =  2  is  always  stable. 

Proof:  From  (2)  we  deduce  that  the  absolute  value  of  each 
element  of  JR  is  less  or  equal  to  1,  hence,  |ai,,  |  <  1.  The  same 
holds  for  bij.  Therefore,  the  eigenvalues  of  S  are  inside  the 
unit  circle,  and  S  is  stable  (regardless  of  the  SNR  or  the  rows 
or  columns  encoders). 

For  the  general  case,  we  develop  in  [3]  an  upper  bound  for 
the  maximal  eigenvalue  of  S  (which  governs  the  convergence 
rate  in  the  vicinity  of  the  fixed  point),  and  sufficient  condi¬ 
tions  for  the  stability  of  the  decoding  algorithm.  The  basic 
component  in  these  conditions  is  the  product  of  the  posterior 
dependence  between  two  bits  in  a  row,  and  the  sum  of  the 
posterior  dependencies  between  one  of  these  bits  and  all  the 
bits  in  its  column.  Hence,  small  column  posterior  dependen¬ 
cies  (i.e.  successful  columns  decoding)  can  compensate  for  a 
large  value  of  inter-row  bit  dependence  (i.e.  unsuccessful  row 
decoding)  and  vice  versa. 

In  our  talk  we  present  simulation  results  for  the  stability  ma¬ 
trices  of  Hamming  [(7,4,3)]2  and  Golay  [(24, 12, 8)]2  product 
codes.  Further  analysis  of  the  results  is  made  using  distri¬ 
bution  histograms  of  the  complete  eigenvalues  spread,  at  the 
algorithm’s  fixed-point. 
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Abstract  —  After  recognizing  classical  turbo¬ 
decoding  as  fix-point  iteration,  alternative  numerical 
methods  such  as  Gauss-Seidel  iteration,  Jacobi  over¬ 
relaxation  iteration,  the  damped  substitution  method, 
and  Newton-type  methods  are  evaluated.  None  of 
these  methods  seem  to  perform  better  than  fix-point 
iteration,  and  it  is  noticed  that  not  even  Newton’s 
method  performs  better  due  to  the  presence  of  a  ran¬ 
dom  interleaver. 

I.  Problem  Statement 

Consider  a  turbo  code  [1]  with  a  random  interleaver  of  length 
N  equal  to  the  block  sizel  transmitted  over  an  additive  white 
Gaussian  noise  channel  using  a  binary  antipodal  signal  set. 
The  iterative  receiver  employs  a  posteriori  probability  (APP) 
algorithms  (MAP  decoders)  for  the  two  constituent  codes. 
In  classical  turbo-decoding,  the  MAP  decoders  simply  feed 
each  other  after  proper  interleaving/deinterleaving,  and  the 
final  decisions  axe  based  on  the  APPs  from  the  last  de¬ 
coder  after  a  number  of  iterations.  Given  a  block  of  received 
samples,  a  full  decoding  iteration  corresponds  to  a  nonlin¬ 
ear  function  g  :  R v  — >  R  v,  which  produces  extrinsic  log- 
likelihood  ratios  (LLR)  from  the  input  a  priori  LLRs.  Using 
similar  functions  Ci,  c2  for  the  constituent  MAP  decoders, 
and  denoting  the  operations  of  the  interleaver/deinterleaver 
with  permutation  matrices  P  and  P_1  =  P7',  respectively, 
g(Ln)  =  PTc2(Pc1(Ln))  at  iteration  n.  Usually,  L°  =  0. 

For  every  received  block,  the  goal  is  to  find  a  multidimen¬ 
sional  fix-point  L*,  in  which  L*  =  g(L").  This  implies  con¬ 
vergence  of  bit  decisions  based  on  L*.  Equivalently,  we  can 
find  a  root  to  f(L')  =  0  where  f(L)  =  g(L)  — L.  Clearly,  clas¬ 
sical  turbo-decoding  corresponds  to  fix-point  iteration  (or  the 
substitution  method),  Ln+1  =  g(Ln).  Of  course,  fix-points 
can  be  found  (possibly  faster)  with  several  other  numerical 
methods. 

II.  Overview  of  Numerical  Methods 

First,  consider  a  recursion  similar  to  Gauss-Seidel’s  method  for 
a  system  of  linear  equations  [2]:  L£+1  =  gk(L™+1, . . .  , 

TJ , . . . ,  L)v),  k  =  1 N.  Normally,  it  converges  slightly 
faster  than  fix-point  iteration.  However,  using  block-mode 
MAP  decoders,  Ci  and  c2  are  evaluated  for  all  k  at  the  same 
time,  thus  not  allowing  a  successive  evaluation.  Instead,  con¬ 
sider  a  method  similar  to  Jacobi  over-relaxation  (JOR)  itera¬ 
tion  for  linear  equations  [3]:  Ln+1  =  a„g(Ln)  +  (1  -  an) Ln, 
0  <  an  <  1.  If  it  converges,  it  still  solves  the  fix-point 
problem.  Since  the  Jacobian  of  g,  Jg(L),  is  attenuated  by 
a  factor  a„,  we  expect  convergence  more  frequently,  but  pos¬ 
sibly  also  at  a  slower  rate.  As  a  third  alternative,  consider 
the  damped  substitution  (DS)  method  Ln+1  =  gos(Ln)  = 
g(Ln)  =  anPTc2(Pci (Ln)),  an  >  0,  and  where  an  ap¬ 
proaches  one  as  n  increases.  Jg(L*)  and  J^S(L*)  are  the 
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same,  suggesting  convergence  equally  frequently.  The  DS 
method  does  not  differ  very  much  from  the  modified  DS 
Ln+1  =  PTc2(v/a^Pci(v/a7Ln)),  originally  used  in  [1], 

With  Newton’s  method  for  solving  f(L*)  =  0,  a  common 
approach  [4]  is  to  use  Ln+1  =  Ln  +a„sn,  0  <  an  <  1,  with  the 
Newton  direction  s'1  =  - Jf_1  (Ln)f(Ln).  an  is  determined  at 
each  iteration  by  performing  a  backtracking  line  search,  which 
aims  at  yielding  a  sufficient  decrease  in  some  function  relat¬ 
ing  to  the  distance  from  the  solution,  e.g.  d(L)  =||  f(L)  ||2. 
Requiring  a  decrease  in  d( L)  is  exactly  what  we  would  do  if 
we  were  trying  to  minimize  d( L)  over  L.  However,  all  its  lo¬ 
cal  minima  need  not  be  roots  of  f ,  a  fact  which  turns  out  to 
be  a  clear  drawback  in  turbo-decoding.  (In  fact,  all  gradient 
methods  suffer  from  the  problem  of  finding  false  roots.)  Fur¬ 
thermore,  due  to  the  random  interleaver,  most  of  the  elements 
of  Jg(L)  are  close  to  zero,  hence  Jf(L)  s;  Jf_1(L)  ss  —I,  re¬ 
ducing  Newton’s  method  to  the  JOR  iteration.  In  practice, 
Jg(L)  certainly  has  some  significant  elements,  but  we  can  still 
imagine  that  the  strength  of  Newton’s  method  -  to  exploit 
the  Jacobian  -  is  of  little  value  for  a  problem  such  as  turbo¬ 
decoding. 

III.  Simulation  Results  and  Conclusions 

The  different  methods  were  compared  by  computer  simula¬ 
tions  estimating  the  bit  error  rate  (BER)  for  a  non-punctured 
turbo  code  employing  two  identical  rate-1/2  recursive  sys¬ 
tematic  convolutional  codes  with  generating  matrix  G(D)  — 

[  1)  (1  +  D“)/(l  +  D  +  £>2)  ],  separated  by  an  5-random  in¬ 
terleaver  with  5=19  and  JV=1024.  At  Eb/No=Q.7f>  dB, 
BERw  10-3  after  10  iterations  with  fix-point  iteration.  With 
the  JOR,  DS,  and  modified  DS  iterations,  a  large  number 
of  attenuation  coefficient  sequences  ending  with  an  —  1  were 
evaluated.  None  of  these  methods  were  able  to  converge  faster 
(in  terms  of  BER  for  a  given  number  of  iterations)  than  fix- 
point  iteration.  With  Newton’s  method,  the  BER  flattens  out 
at  appr.  10~2  even  in  the  absence  of  noise,  obviously  because 
it  finds  a  false  root  for  many  blocks.  In  conclusion,  fix-point 
iteration  seems  to  be  the  superior  choice  in  most  situations. 
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Abstract  —  Asymptotic  behaviour  of  the  minimum 
number  of  the  repeated  transmissions  of  a  sequence 
over  a  discrete  memoryless  channel  sufficient  for  its 
exact  (or  within  a  permissible  Hammimg  distance)  re¬ 
construction  with  a  given  error  probability  is  found. 

I.  Introduction 

Traditional  problems  of  theory  of  information  transmission 
consist  in  efficient  transmission  of  messages  over  noisy  chan¬ 
nels  which  are  described  by  combinatorial  or  probabilistic  con¬ 
ditions.  For  solution  of  these  problems  it  is  used  a  coding  to 
introduce  a  redundancy  to  messages  so  that  the  distance  be¬ 
tween  the  encoded  messages  would  be  sufficiently  large  and 
allows  one  to  correct  errors  at  the  output  of  channels.  How¬ 
ever,  some  natural  problems,  such  as  analysis  of  observations, 
recovering  genetic  information,  require  to  reconstruct  an  ar¬ 
bitrary  sequence  using  a  sufficiently  large  number  N  of  its 
erroneous  patterns.  In  these  cases  we  in  fact  use  the  repeated 
transmission,  because  such  patterns  can  be  considered  as  a  re¬ 
sult  of  an  IV-tuple  transmission  of  the  sequence  over  the  same 
combinatorial  or  probabilistic  channel.  This  gives  rise  to  new 
combinatorial  and  probabilistic  problems  of  finding  the  mini¬ 
mum  number  N  of  erroneous  patterns  sufficient  to  reconstruct 
an  unknown  sequence  with  a  given  accuracy.  The  precise  set¬ 
ting  of  the  problems  and  solutions  to  a  number  of  them  can 
be  found  in  [2],  [3].  Below  we  present  some  results  for  a  dis¬ 
crete  memoryless  channel  C  with  a  matrix  ( p,j )  of  transition 
probabilities  of  the  set  Fq  =  {0, 1, ...,  q}  into  Fr,  q  >  2,r  >  2. 

II.  Optimal  and  Reducible  ,/V-Recontructors 


We  deal  with  non- degenerate  channels  C  for  which  the  tran¬ 
sition  matrix  (pij)  does  not  have  two  identical  rows  and  con¬ 
tains  a  column  with  at  least  two  nonzero  probabilities.  For 
any  i,k  €  Fq,  denote  by  C(i,k)  the  subset  (which  may  be 
empty)  consisting  of  all  j  €  Fr  such  that  Pi,jPk,j  >  0.  For 
any  s,  0  <  s  <  1,  let  £*i,k(s)  =  Eygc(i,*)  P*,j  Pko  an<^ 
a(C)  =  maxi,*gF,,  i^k  mino<s<i  Qi,fc(s).  One  can  show  that 
0  <  a (C)  <  1  if  and  only  if  C  is  non-generate  channel. 
For  any  x  —  (xi,...,xn)  €  Fq  and  Y  =  (yi,-.,yN),  where 
Vj  =  €  F^ ,  j  =  1, ...,1V,  we  set  Pc(Y\x)  = 

nLriL  \P*k,vk  r  We  consider  Y  as  the  matrix  (yij)  of 
the  size  n  x  N  over  Fr  and  denote  by  YU[n  the  set  of  all 
rnN  such  matrices.  Let  Mn  be  the  set  of  all  mappings 
/  :  Yn,  N  — ►  Fq  ,  n  =  1,2,...  ,  which  are  referred  to  as  N- 
' reconstructors.  Given  an  integer-valued  function  d  =  d(n), 
0  <  d  <  n,  and  /  e  Mn,  one  can  calculate  the  error  probabil¬ 
ity  Pc(f,x,d,  N)  =  Eyev„iN,  dH(f(Y),x)>d  Pc ^  I1)  recon" 
structing  x  6  Fq  with  at  most  d  wrong  letters  (here  d/r  («,  x)  is 

'This  work  was  supported  by  the  RFBR  Grant  99-01-00941 


the  Hamming  distance).  Note  that  the  case  d  —  0  corresponds 
to  the  exact  reconstruction.  We  set 

Pc(n,d,N)=  min  max  Pc  (/,  x,  d,  N)  (1) 

!£Mn 

and  call  an  IV-reconstructor  /  optimal  if  it  gives  the  mini¬ 
mum  in  (1)  for  all  n  (optimal  IV-reconstructors  exist  for  any 
function  d  =  d(n)).  An  IV-reconstructor  /  for  a  memory¬ 
less  channel  C  is  called  reducible,  if  there  exists  a  memoryless 
channel  Cn  such  that  for  any  n  (n  =  1,2, ...  )  and  x,z  €  Fq  , 
EreYn,N,  f(Y)=z  pc(Y\x)  =  pcN{z\x).  Thus,  the  action  of  a 
reducible  iV-reconstructor  reduces  AT-tuple  transmission  of  a 
message  over  C  to  its  single  transmission  over  another  “im¬ 
proved”  memoryless  channel  Cn-  Reducible  A-reconstructors 
are  in  general  not  optimal,  but  we  use  them  and  the  classical 
work  [1]  to  obtain  the  following  estimates. 

Theorem  1  For  any  non- degenerate  discrete  memoryless 
channel  C,  Pc(n,d,  N)  =  —  p)n~l ,  where 

(2  q)-1  (a{C)f  <  P  <  (q  -  l)  (a(C))N  ,  and 

(3(C)  —  \/2minmaxjgc(i,it)  llnPi,j/Pi,jl  with  the  minimum  be¬ 
ing  taken  over  all  i,k  €  Fq  such  that  a(C)  —  mino<3<i  Qi,t(s). 


III.  The  Minimum  Number  of  Repetitions 
Denote  by  Nc{n,d,e)  the  minimum  integer  N  such  that 
Pc(n,d,N)  <  e,  0  <  e  <  1/2.  Thus,  Nc(n,d,e )  is  the  mini¬ 
mum  number  of  repeated  transmissions  that  allow  one  to  re¬ 
construct  any  sequence  of  length  n  with  accuracy  up  to  d 
letters  with  the  error  probability  at  most  e. 


Theorem  2  Let  e  —  e(n )  >  0  and  d  =  d(n)  >0  be  functions 
such  that  e  — >  0  and  d/n  -A-  0  as  n  — >  oo.  Then  for  any 
non- degenerate  discrete  memoryless  channel  C, 


Nc(n,  d,  e) 


(In 


d  +  1 


+ 


1 

d  -f  1 


In  -)/  In 


1 

a(CY 


In  particular,  by  Theorem  2  Nc(n,d,e)  grows  linearly  with 
length  n  when  the  permissible  error  probability  e  of  recon¬ 
struction  of  a  sequence  with  a  fixed  number  d  or  less  wrong 
letters  decreases  exponentially  with  n.  On  the  other  hand,  one 
can  prove  that  if  d  >  S n,  where  0  <  5  <  1,  and  e  >  2~cn,  c  >  0, 
then  Nc(n,d,e)  is  restricted  above  by  a  constant.  For  in¬ 
stance,  in  the  case  of  the  symmetric  binary  channel  with 
p  =  0.02,  we  get  that  for  <5  =  0.01,  and  c  =  0.1  five  repe¬ 
titions  are  sufficient  independently  of  length  n. 
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Abstract  — •  In  this  paper,  we  present  a  new  approach 
for  channel  coding  on  unknown  or  time-varying  chan¬ 
nels.  Given  a  family  of  possible  channel  characteris¬ 
tics,  we  define  a  multi-resolution  channel  code  as  a  sin¬ 
gle  channel  coding  codebook  from  which  a  collection 
of  codes  of  increasing  rates  are  extracted  by  choosing 
larger  and  larger  nested  subsets  of  the  original  set  of 
codewords.  We  give  an  achievable  rate  region  and  a 
tight  converse. 

I.  Introduction 

Consider  the  problem  of  channel  coding  for  an  unknown  or 
time- varying  channel.  Given  a  family  of  possible  channel  char¬ 
acteristics,  in  theory  we  could  design  a  different  channel  code 
for  each  channel  in  our  collection.  Assuming  knowledge  (at 
both  the  transmitter  and  the  receiver)  of  the  channel  in  oper¬ 
ation  at  communication  time,  both  encoder  and  decoder  could 
choose  among  the  family  of  codes.  The  resulting  strategy 
would  theoretically  achieve  the  capacity  of  any  channel  in  the 
collection  (e.g.,  [1],  [2]  and  [3]).  Unfortunately,  this  approach 
requires  an  uncountably  infinite  collection  of  channel  codes 
when  G  is  uncountably  infinite.  We  define  a  multi-resolution 
channel  code  (MRCC)  as  a  single  channel  coding  codebook 
from  which  a  collection  of  codes  of  increasing  rates  are  ex¬ 
tracted  by  choosing  larger  and  larger  nested  subsets  of  the 
original  set  of  codewords.  Given  a  collection  Q  of  channels 
and  a  fixed  set  of  2nRm‘x  channel  codewords,  the  MRCC  uses 
the  first  2nr ^  codewords  to  code  at  rate  r(0)  <  Rm&x  across 
channel  0  6  G.  We  here  consider  the  set  of  rate  functions  r(-) 
achievable  on  a  fixed  class  G  of  channels.  MRCCs  are  similar 
in  application  to  punctured  channel  codes  (e.g.,  [4]). 

II.  Preliminaries 

Consider  a  class  G  of  memoryless  channels  with  common  in¬ 
put  alphabet  A  and  output  alphabet  B.  For  each  6  £  G, 
let  C(6)  and  ve  denote  the  capacity  and  conditioned  distribu¬ 
tion  of  channel  8,  respectively.  Given  G,  a  positive  constant 
Rmzxj  and  a  rate  function  r  :  G  — »  [0,  -Rmax]  that  is  measur¬ 
able  with  respect  to  the  Borel  cr-algebra  of  open  subsets  on 
G,  a  MRCC  Cn  =  (Tn,  fn,gn,  r)  on  G  is  a  single  channel  code 
defined  by  a  codebook  Tn,  a  measurable  encoder  /„,  and  a 
measurable  decoder  gn.  The  channel  codebook  Tn  contains 
[2"Rm*xJ  blocklength-n  codewords.  The  codewords  are  or¬ 
dered  and  denoted  by  Tn  —  {an(l), . . . ,  an(L2nHmaxJ)}.  The 
channel  8  £  G  in  operation  is  assumed  to  be  fixed  and  known 
to  the  channel  code’s  encoder  and  decoder  during  any  sin¬ 
gle  channel  use.  The  channel  may  vary  from  channel  use  to 
channel  use.  For  any  6  £  G,  the  code  is  used  at  rate  r($) 
on  channel  8.  The  collection  Un  '  of  allowable  messages  on 

1This  work  was  supported  by  NSF  MIP-9501977  and  CCR- 
9909026  and  grants  from  the  Lee  Center  for  Advanced  Networking 
and  the  Powell  Foundation. 


8  is  defined  as  =  {1, . . . ,  L2nr(9)J}.  For  any  0  £  G,  the 
encoder  is  defined  as  fn(8,u)  =  an(u)  for  all  u  6  U^‘,  the 
corresponding  decoder  gn(9,  •)  maps  the  channel  output  space 
Bn  back  to  the  set  Un'1  of  allowable  messages.  For  any  6  €  G 
and  u  €  Un  \  let  =  {yn  £  Bn  :  gn(0,yn)  =  «}  represent 
the  decoding  cells  associated  with  u  and  6.  Then  for  any  class 
G  of  channels  and  MRCC  C„  =  (Pn,  fn,gn,rn),  we  define  the 
average  probability  of  error  of  C„  on  G  with  respect  to  /3  as 

p$(cn,n)=J^  £  *£((r<?>ria»)  dm, 

°  L  uet#') 

where  /3  is  an  arbitrary  distribution  on  G. 

A  (L2"Rm“J,n,r(-),e)-61ocfc  MRCC  for  (G,/3)  is  defined 
as  a  MRCC  C„  =  (Pn,  fn,gn,r)  with  (Cn ,  G)  <  e.  For 
any  Rmax  <  oo,  we  call  the  rate  function  r  :  G  — >  [0,i?m8X] 
achievable  on  G  if  for  any  distribution  /3  there  exists  a  se¬ 
quence  of  (L2,lRm*xJ , n, r„(-), 6„)-block  MRCCs  with  respect 
to  /3  on  G  such  that  lim^-,^  rn(0)  =  r(0)  for  each  0  £  G  and 

llUlr,  oc  €n  =  0. 

III.  Results 

Theorem  1  If  SI  is  a  collection  of  stationary,  memoryless 
channels  such  that  maxegn  C(9)  <  oo,  then 

r{8)  =  C{8)  V0  €  G 

is  achievable  if  one  of  the  following  holds:  (1)  SI  is  finite;  (2) 
the  channel  input  alphabet  is  finite;  (3)  the  optimal  input  dis¬ 
tribution  for  each  8  £  SI,  has  a  bounded  derivative  and  power. 

Theorem  2  If  SI  is  a  collection  of  stationary,  memoryless 
channels  such  that  maxeen  0(8)  <  oo  and  the  power  budget  is 
P(6)  for  any  6  e  SI,  then 

r(8)  =  C(0)  V0£SI 

is  achievable  if  one  of  the  following  holds:  (1)  SI  is  finite;  (2) 
the  channel  input  alphabet  is  finite;  (3)  for  each  8  £  SI,  pe(x), 
the  optimal  input  distribution  for  channel  9  has  a  bounded 
derivative  and  for  any  e  >  0,  there  exists  an  Sa  such  that 
!\*\>sa  x2^(x)dx  <  e. 

References 

[1]  A.  J.  Goldsmith  and  M.  Medard,  Capacity  of  Time- Varying 
Channels  with  Causal  Channel  Side  Information,  Preprint. 

[2]  G.  Caire  and  S.  Shamai,  On  the  capacity  of  some  channels  with 
channel  state  information,  IEEE  Trans.  Inform.  Theory,  vol. 
45,  pp.  2007-2019,  Sep.,  1999. 

[3]  J.  Wolfowitz,  Coding  Theorems  of  Information  Theory, 
Springer- Verlag,  3rd  edition,  1978. 

[4]  S.  B.  Wicker,  Error  Control  Systems  for  Digital  Communica¬ 
tion  and  storage,  Prentice-Hall,  Englewood  Cliffs,  NJ,  1995. 


0-7803-5857-0/00/$  1  0.00  ©2000  IEEE. 


487 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


Transmission  of  a  Slowly  Varying  Markov  Signal 
over  Memoryless  Channels1 


Mark  S.  Pinsker 
Institute  for  Information 
Transmission  Problems 
of  the  Russian  Acad.  Sci., 
19  Bol’shoi  Karetnyi, 
101447  Moscow,  Russia 
pinskerSiitp.ru 


Viacheslav  V.  Prelov 
Institute  for  Information 
Transmission  Problems 
of  the  Russian  Acad.  Sci., 
19  Bol’shoi  Karetnyi, 
101447  Moscow,  Russia 
prelov@iitp.ru 


Edward  G.  van  der  Meulen 
Department  of  Mathematics 
Katholieke  Universiteit  Leuven, 
Celestijnenlaan  200B, 

3001  Heverlee,  Belgium 
prof Sgauss . nis.kuleuven.ac.be 


Abstract  —  Information  rates  in  certain  discrete¬ 
time,  memoryless,  stationary  channels  with  additive 
non-Gaussian  noise  and  slowly  varying  input  signal 
are  investigated.  Under  the  assumption  that  the  in¬ 
put  signal  is  a  stationary  Markov  chain  with  rare  tran¬ 
sitions,  it  is  shown  that  the  information  rate  is  asymp¬ 
totically  equivalent  to  the  entropy  of  the  chain  and, 
therefore,  the  main  term  of  its  asymptotics  does  not 
depend  on  the  channel  noise. 

I.  Introduction 

Consider  a  stationary  channel  whose  output  signal  Y  —  {Y,} 
is  equal  to  the  sum 

YJ  =  X]  +  Z] ,  j  =  0,  ±1, . . . ,  (1) 

where  the  input  signal  X  =  { Xj }  and  the  noise  Z  =  {Z,}  are 
independent,  discrete-time,  stationary  processes.  The  prob¬ 
lem  of  explicit  calculation  of  the  information  rate  I(X ;  Y)  (i.e. , 
the  mutual  information  per  unit  time)  in  such  a  channel  is, 
except  for  a  number  of  special  cases,  rather  hard.  Therefore, 
it  is  important  to  obtain  good  upper  and  lower  bounds  and 
investigate  the  asymptotic  behavior  of  I(X\  Y)  under  different 
assumptions  on  the  behavior  of  the  parameters  which  charac¬ 
terize  the  input  and  noise  processes. 

In  most  previous  papers  dealing  with  this  subject  such  an 
asymptotic  behavior  of  I(X;Y)  has  been  analyzed  for  the  case 
where  the  input  signal  is  weak,  i.e.,  the  noise  is  large.  Here, 
we  do  not  assume  that  the  power  of  the  input  signal  or  the 
noise  goes  to  zero  or  infinity,  respectively,  but  we  consider  the 
case  where  the  input  signal  {A,},  depending  on  a  parameter 
e,  is  a  slowly  time  varying  stationary  Markov  chain  with  a 
finite  number  of  states  (i.e.,  the  transition  probabilities  of  it 
tend  to  0  or  1  as  £  — -  (I ) .  Thus,  the  model  (1)  can  be  consid¬ 
ered  as  a  special  case  of  the  well-known  simple  hidden  Markov 
model.  In  [1],  for  such  kind  of  model  the  asymptotics  of  the 
mean-square  error  for  the  optimal  estimates  of  Xn  from  the 
observations  YT^,  =  {YJtj  <  ?i}  was  found.  We  use  the  re¬ 
sult  of  [1]  to  derive  the  asymptotics  of  the  information  rate  as 
£  — ►  0  for  the  channel  model  considered. 

In  this  connection,  it  should  be  noted  that  some  relations 
between  the  mutual  information  and  a  causal  mean-square 
filtering  error  were  observed  many  years  ago.  But  most  results 
in  this  area  were  obtained  for  continuous-time  models  with 
additive  white  Gaussian  noise. 

'This  work  was  supported  in  part  by  the  Russian  Fundamen¬ 
tal  Researcli  Foundation  under  Grant  99-01-00828  and  in  part  by 
INTAS  under  Grant  94-469. 


II.  Main  Result 

As  already  mentioned  in  the  introduction,  we  consider  a  sta¬ 
tionary  channel  whose  input  signal  Xc  —  {Xj}  (and,  there¬ 
fore,  the  output  signal  Y'  =  { Y/ } )  depends  on  a  parameter 
e  >  0  and 

Yj  =  X]  +  3  =  0,±1,....  (2) 

It  is  assumed  that  X‘  is  a  stationary,  aperiodic,  and  ir¬ 
reducible  Markov  chain  with  a  finite  number  of  -  states 
{zi,...,zm,  x,  G  R,  i  =  l,...,m}  and  transition  proba¬ 
bilities 


=  p{a;1+1  =  =  z,}  =  \i]\  (3) 


where  An  =  ^2  ^0 ■  We  will  also  assume  that  Z  =  {Z}}  is  a 
sequence  of  real  valued  i.i.d.  random  variables  independent  of 

Xc. 

Theorem.  If  E\ZO\0  <  °o  for  some  /?  >  4,  then 


I(X‘-,Y‘)  =  H(X‘)(l+o(l)),  e 


where 

H(XC)  =  (EE  q.Xij  j  e  (logi)  (l  +  o(l)),  e  — *  0, 

\.=i  j*<  / 

is  the  entropy  of  the  Markov  chain  Xe ,  and  {qk  = 
liin  P{A'„  =  z*},  k  —  l,...,m}  is  the  stationary  distri- 

n  — ►  co 

button  of  it  (which  does  not  depend  on  e). 

The  proof  of  this  theorem  can  be  found  in  [2].  There, 
we  also  compare  the  asymptotic  behavior  of  the  information 
rates  I(Xc;Xe  +  Z)  for  the  channel  model  (2)  under  the  as¬ 
sumptions  that:  1)  Xc  is  a  stationary  Markov  chain  with 
two  states  zi,  xi  G  R  and  transition  probabilities  (3)  where 
A  j  2  =  A,  A21  =  p,  2)  X‘  is  a  stationary  Gauss-Markov  pro¬ 
cess  with  the  same  covariance  function  as  the  Markov  chain 
above.  It  is  also  assumed  that  the  noise  Z  =  {Zjj  in  both 
cases  is  a  sequence  of  i.i.d.  Gaussian  random  variables. 
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Abstract  —  The  broadcast  disk  provides  an  effec¬ 
tive  way  to  transmit  information  from  a  server  to 
many  clients.  Information  is  broadcast  cyclically  and 
clients  pick  the  information  they  need  out  of  the 
broadcast.  An  example  of  such  a  system  is  a  wireless 
web  service  where  web  servers  broadcast  to  brows¬ 
ing  clients.  Work  has  been  done  to  schedule  the  in¬ 
formation  broadcast  so  as  to  minimize  the  expected 
waiting  time  of  the  clients.  This  work  has  treated  the 
information  as  indivisible  blocks  that  are  transmitted 
in  their  entirety.  We  propose  a  new  way  to  sched¬ 
ule  the  broadcast  of  information,  which  involves  split¬ 
ting  items  into  smaller  sub-items,  which  need  not  be 
broadcast  consecutively.  This  relaxes  the  restrictions 
on  scheduling  and  allows  for  better  schedules.  We 
look  at  the  case  of  two  items  of  the  same  length,  each 
split  into  two  halves,  and  show  that  we  can  achieve  op¬ 
timal  performance  by  choosing  the  appropriate  sched¬ 
ule  from  a  small  set  of  schedules. 

I.  Model  and  Problem 

The  broadcast  disk  is  a  way  to  send  information  to  many 
clients  at  the  same  time  over  a  broadcast  medium.  The  broad¬ 
cast  disk  is  a  central  server  that  acts  as  a  common  cache  for 
many  clients.  Data  at  the  server  is  made  available  cyclically  to 
the  clients,  according  to  the  broadcast  schedule.  The  goal  is 
to  schedule  the  broadcast  information  in  a  way  that  minimzes 
the  expected  waiting  time  of  the  clients. 

Vaidya  and  Hammeed  [1]  worked  out  the  optimal  broadcast 
frequencies  of  items  within  a  schedule  as  a  function  of  their 
demand  probabilities,  p, ,  and  lengths,  U .  They  showed  that  to 
minimize  expected  waiting  time,  the  frequencies  of  broadcast, 
fi,  should  be  proportional  to  This  led  to  an  algorithm 

that  attempted  to  achieve  these  relative  frequencies.  This 
algorithm  is  good  because  it  is  computationally  fairly  simple 
and  works  for  an  arbitrary  number  of  broadcast  items  with 
arbitrary  lengths  and  demand  probabilities.  However,  they 
make  some  assumptions  about  spacing  of  items  that  do  not 
hold  in  most  cases. 

We  look  at  a  new  way  to  schedule  the  items,  which  allows 
us  to  achieve  better  expected  waiting  times.  We  consider  the 
case  of  two  items  of  the  same  length,  and  we  split  each  item 
into  two  halves.  We  then  schedule  these  pieces  of  the  items 
for  broadcast.  We  find  the  optimal  schedule  under  these  con¬ 
ditions  based  on  the  demand  probabilities. 

We  represent  a  schedule  by  a  sequence  of  numbers.  Each 
number  represents  a  piece  of  an  item.  For  example,  1122 
means  we  broadcast  two  pieces  of  iteni  1  followed  by  two  pieces 
of  item  2.  To  determine  which  piece  of  an  item  to  send,  we 

1This  research  was  partially  supported  by  the  Lee  Center  for 
Advanced  Networking  at  Caltech 


look  at  which  piece  of  that  item  was  sent  last  and  send  the 
other  piece. 

Our  metric  for  evaluating  schedules  is  the  following: 

Definition  1  EWT(S,  pi)  is  the  expected  waiting  time  us¬ 
ing  schedule  S  with  demand  probabilities  pi  and  p2  —  1  —  Pi, 
assuming  two  items,  each  of  length  one  unit,  split  into  two 
halves. 

By  “expected  waiting  time” ,  we  mean  the  total  amount  of 
time  that  a  client  spends  listening  to  the  broadcast  channel, 
not  including  the  time  spent  obtaining  the  desired  data.  We 
assume  that  clients  start  listening  at  random  times  uniformly 
distributed  over  the  broadcast  cycle. 

II.  Summary  of  Results 

The  main  result  is  the  following: 

Theorem  1  For  two  items  of  the  same  length,  each  split  into 
two  halves,  the  broadcast  schedule  that  minimizes  expected 
waiting  time  is: 

1122,  if  pi  €  (&,i] 

11222,  if  pi  €  (&,&] 

112222,  if  pi  e  (±,&] 

n 

1221222^2, n  =  Max  ^0, 

(0. 1] 

For  a  more  detailed  discussion  of  this  result,  refer  to  [2], 
This  theorem  tells  us  that  with  two  items  of  equal  length,  the 
optimal  schedule  is  a  simple  function  of  the  demand  probabil¬ 
ity  pi .  To  prove  this  result,  we  first  prove  some  lemmas  about 
comparing  the  waiting  times  of  different  schedules.  Then,  we 
use  these  lemmas  to  narrow  the  set  of  schedules  to  a  small 
set  of  schedules.  From  this  set,  we  numerically  compare  the 
schedules  to  find  which  is  best  and  for  what  value  of  pi . 

Is  is  surprising  that  the  optimal  schedule  is  such  a  simple 
function  of  pi .  The  set  of  possible  schedules  is  uncountably 
infinite.  Using  certain  rules  of  manipulation,  we  can  reduce 
this  uncountable  set  to  a  countable  set,  which  is  essentially  25 
types  of  schedules,  each  parameterized  by  length.  From  these, 
we  see  that  only  the  small  set  of  schedules  in  the  theorem  are 
optimal. 
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Abstract  —  The  goal  of  this  paper  is  to  complete 
results  about  /-projections  and  reverse  /-projections, 
and  to  correct  some  errors  in  the  literature.  A  new 
tool  is  the  concept  of  convex  support  of  a  probabil¬ 
ity  measure,  better  suited  for  our  purposes  than  the 
familiar  closed  convex  support. 

I.  Preliminaries 

For  probability  measures  (pm’s)  on  the  same  measurable 
space,  D(P\\Q)  denotes  information  divergence  (relative  en¬ 
tropy).  Its  infimum  for  P  or  Q  in  a  set  S  of  pm’s  is  denoted 
by  D(<S||Q)  and  D(P\\S),  respectively.  If  here  a  unique  min- 
imizer  exists,  it  is  called  the  /-projection  of  Q  to  S  or  the 
reverse  /-projection  (rl- projection)  of  P  to  S.  Such  projec¬ 
tions,  particularly  to  linear,  respectively  exponential  families 
of  pm’s  occur  in  various  problems  of  probability  and  statistics. 
Previous  works  studying  these  projections  include  Cencov  [1], 
Csiszar  [2],  [3],  Topsoe  [5],  etc. 

We  will  consider  linear  families  Ca  —  {P  '■  f  x  dP  =  a)  for 
a  6  Rd,  and  exponential  families  of  pm’s  on  Rd 

£q  =  {Q#  ■  (x)  =  exp[(i9,x)  -  Aq(i?)]  ,  i9  £  dom  Aq}, 

where 

Aq($)  =  log  J”  exp(iZ,  x)  dQ ,  dom  Aq  =  {t?  :  Aq(i9)<oo}; 

more  general  situations  can  be  easily  reduced  to  this  [3]. 

We  define  the  convex  support  cs(Q)  of  a  pm  Q  on  as 
the  intersection  of  all  convex  sets  of  Q  measure  1. 

Theorem  1.  £>(£a|Q)  is  finite  iff  a  £  cs(Q). 

We  also  introduce  the  extended  exponential  family 

ext  ( £q )  =  [J  {£qf  :  F  non-empty  face  of  cs(Q)} 

(see  [4]  for  the  definition  of  and  basic  facts  about  faces)  where 
Qf  denotes  the  conditional  distribution  determined  by  Q  con¬ 
ditioned  on  F,  the  closure  of  F.  Note  that  £q  C  ext  ( £q )  with 
equality  iff  cs (Q)  is  open. 

A  similar  construction  appears  in  [1],  using  closed  convex 
support  (equal  to  cs(Q))  rather  than  cs(Q),  but  several  asser¬ 
tions  there  are  false.  The  'right’  extension  concept  permits  us 
to  correct  those. 

Example.  Let  Q  be  the  normalized  sum  of  the  Lebesque 
measure  on  the  unit  square  and  the  point  masses  Si,.  Sc  at 
6  =  (|,0),  c  =  (§,0).  Then  <5t  and  5C  belong  to  ext  ( £q )  but 

'This  work  was  supported  by  the  HSSS  programme  of  ESF,  by 
the  Hungarian  National  Foundation  for  Scientific  Research,  Grant 
T  26041  and  by  Grant  Agency  of  Academy  of  Sciences  of  the  Czech 
Republic,  Grant  A  1075801. 
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not  to  the  union  of  £q  with  its  ‘boundary  at  infinity’  in  the 
sense  of  [1];  they  have  no  r/-projection  to  that  union,  contra¬ 
dicting  Theorem  23.3  of  [1]. 

II.  Main  results 

Let  {R  :  D(5|/l)  =  0}  be  the  I -closure  clj(S)  and  {/?  : 
D(R\S)  —  0}  the  reverse  I -closure  cl,i{S)  of  a  set  S  of  pm’s. 
Theorem  2.  For  every  exp.  family  £  —  £q  and  a  E  cs(Q) 
there  exists  a  unique  pm  Q"a  £  in  cli  (Ca)  fl  ext  (£).  It  satisfies 

D(P\\R)  =  D(P\Qa,e)  +  D{Ca\R) ,  P  €  Ca  ,  R  €  ext  (£) . 

The  pm  Qa.e  belongs  to  £qf  where  F  denotes  the  unique  face 
of  cs(Q)  whose  relative  interior  contains  a. 

Corollary  1.  For  a  pm  Q  and  a  g  cs (Q)  the  /-projection  of 
Q  on  Ca  exists  iff  £a  intersects  ext(fQ)  A  sufficient  condition 
for  the  latter  is  dom  A q  =  Rd. 

Corollary  2.  If  £  is  an  exponential  family  and  P  is  a  pm 
with  mean  a  such  that  D(P |  ext  (£))  is  finite  then  the  reverse 
/-projection  of  P  to  ext  (£)  exists  and  equals  Q"a£. 

Theorem  3.  For  every  exp.  family  £  =  £q  and  a  €  cs(Q ) 
there  exists  a  unique  Pf  £  in  cfii  (£)  such  that 

D(P\P;,s)  =  D(P\\£)  =  D(P\clri(£)) ,  P  £  Ca. 

This  P*  £  has  a  mean  a* ,  and  satisfies 
D(P\\R)  >  D(P\\£)  +  D{P;,£\\R) ,  PECa,  RE  cl„  {£) . 

Corollary  3.  If  £  is  an  exponential  family  and  P  is  a  pm 
with  mean  a  such  that  D{P\£)  is  finite  then  the  r/-projection 
of  P  to  c!ri{£)  exists  and  equals  Fa*  £.  The  r/-projection  of  P 
to  £  exists  iff  P*  £  E  £. 

Corollary  4.  The  following  assertions  are  equivalent 
1.  D(Ca\\£)  =  0 
2-  Pa,£  ~  Qa,£ 

3.  cl i  {Ca)  C  clri{£)  is  nonempty. 

Theorem  4.  ext  (£q)  is  variation  closed.  A  sufficient  condi¬ 
tion  for  the  equality  in  cl,  i  ( £q  )  C  ext  ( £q )  is  dom  A q  —  Rd  . 
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Abstract  —  The  information  value  decomposition 
approximates  a  positive-valued  matrix  by  a  sequence 
of  reduced  rank  matrices.  A  rank  K  approximating 
matrix  is  closest  to  the  original  matrix  in  the  sense 
of  minimizing  the  discrimination  between  the  original 
matrix  and  the  approximation,  over  rank  K  matrices. 
The  information  value  decomposition  is  analogous  to 
the  singular  value  decomposition  with  discrimination 
used  for  the  discrepancy  measure  instead  of  squared 
error.  Several  properties  of  the  information  value  de¬ 
composition  correspond  to  properties  of  the  singular 
value  decomposition.  These  properties  are  discussed. 

I.  Introduction 

The  singular  value  decomposition  is  arguably  the  most  im¬ 
portant  tool  in  numerical  linear  algebra  and  is  used  widely  in 
virtually  all  areas  of  science  and  engineering.  The  singular 
value  decomposition  computes  the  real-valued,  rank  K  ap¬ 
proximation  to  a  real-valued  matrix  that  is  closest  to  that 
matrix,  where  the  closeness  or  discrepancy  is  measured  us¬ 
ing  squared  error.  The  information  value  decomposition  com¬ 
putes  the  closest  positive-valued,  rank  K  approximation  to  a 
positive- valued  matrix,  where  discrimination  (or  I-divergence) 

7(A||B)  =  Y,  £  aH  In  g  -  aij  +  btj  (1) 

*  j 

is  the  discrepancy  measure. 

The  information  value  decomposition  is  useful  for  problems 
where  the  data  are  naturally  positive-valued,  including  prob¬ 
lems  in  optical  and  hyperspectral  imaging,  and  in  approxima¬ 
tions  of  joint  probabilities. 

Several  properties  of  the  information  value  decomposi¬ 
tion  result  from  properties  of  discrimination  (see  the  work  of 
Csiszar  [1,  2,  3]).  Two  properties  are  equivalent  to  the  suc¬ 
cessive  projection  property  of  squared  error.  Let  £  be  any 
nonempty  linear  set 

£  =  {P  €  R+  :  Qp  =  q).  (2) 

Then  for  any  p  £  C, 

7(p||r)  =  J(p||p*) +  /(p‘||r),  (3) 

where  p*  =  argminp€£  7(p||r).  Let  £  be  any  nonempty  expo¬ 
nential  set 

5  =  (rG  R+  :  rt  =  ni  exp(y^  Pij^j),  for  some  /i}.  (4) 

i 

Then,  for  any  r  6  £, 


where  r’  =  argminrg£  7(p||r).  These  two  successive  projection 
properties  are  central  to  the  analysis  of  the  information  value 
decomposition. 

Many  alternating  minimization  algorithms  [3,  4]  can  be 
rewritten  as  minimizing  the  first  variable  over  a  linear  set  and 
minimizing  the  second  variable  over  an  exponential  set 

minmin7(p||r).  (6) 

r€£  p€C 

The  computation  of  the  information  value  decomposition  may 
be  written  in  this  form,  yielding  an  iterative  algorithm  for 
a  rank  K  approximation.  Write  the  approximating  ma¬ 
trix  in  terms  of  factors  as  B  =  XVT,  where  X  and  V 
are  nonnegative-valued  matrices  with  K  columns,  and  the 
columns  of  V  sum  to  1.  Denote  the  set  of  all  such  rank  K 
matrices  as  P(K,m,n).  The  optimal  rank  K  matrix  is  found 
as  B^  =  X(K)V(K)r  and  achieves 

B*-*"-1  =  argmin  7(A||B)„  (7) 

B  eP(K,m,n) 

The  rank  one  solution  is  given  by  the  normalized  marginals 
on  the  columns  and  rows  of  A.  If  the  entries  of  A  sum  to 
1,  the  resulting  discrimination  equals  the  mutual  information 
between  two  random  variables  whose  joint  distribution  is  A.  If 
the  entries  do  not  sum  to  1,  the  discrimination  is  proportional 
to  such  a  mutual  information,  with  proportionality  constant 
equal  to  the  total  sum  of  the  entries  of  A. 

The  successive  approximation  properties  yield  expressions 
for  the  improvement  going  from  rank  K  to  rank  K  +  1  ap¬ 
proximations. 
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Abstract  —  In  this  paper,  the  one-to-one  corre¬ 
spondence  between  group-theoretic  inequalities  and 
information-theoretic  inequalities  are  established. 
The  consequence  is  that  we  can  prove  an  information- 
theoretic  inequality  by  proving  its  corresponding 
group-theoretic  inequality  and  vice  versa.  Finally,  a 
new  non-trivial  group-theoretic  inequality  is  found  us¬ 
ing  this  approach.  The  meaning  of  this  inequality  is 
yet  to  be  understood. 

I.  Group-theoretic  inequalities 
Definition  1.1  Let  O  be  a  finite  group  and  Gi,G2,  •  •  • ,  Gn  be 
subgroups  of  G.  A  group-theoretic  inequality  is  an  inequality 
that  involves  only  additions  or  subtractions  of  terms  of  the 
form  log  where  |G|  is  the  order  of  the  group  G  and  \Ga  \ 
is  the  order  of  the  subgroup  f\€a  <3>- 

A  group-theoretic  inequality  is  valid  if  and  only  if  it  is  satisfied 
by  all  finite  groups.  For  example,  log  +  loS  |cino3|  ^ 

log  idnojnosi  +  loS  IG3T  is  a  valid  group-theoretic  inequality. 

II.  A  One-to-one  correspondence 

For  any  information  inequality,  we  can  establish  a  group- 
theoretic  inequality  through  the  following  transformation.  For 
any  entropy  term  in  the  information  inequality,  say  H(Xa), 
the  entropy  of  the  joint  random  variable  ( X :  »  G  a),  we 
change  it  to  log  .  Then  the  inequality  obtained  is  a  group- 
theoreitc  inequality.  For  example,  given  the  information  in¬ 
equality,  H(Xi,X 3)  +  H(X2,X3)  >  H(XltX2,X3)  +  H(X3), 
after  transformation,  we  obtain  the  group-theoretic  inequality 

loS  |GinG3|  +loS  |G2nc3|  -  ,0S  [c1nG2inG3|  +loS  iWt-  can  be 
seen  easily  that  the  transformation  is  reversible,  that  is,  given 
any  group-theoretic  inequality,  we  can  find  its  corresponding 
information  inequality. 

The  main  result  of  this  work  is  the  following  theorem. 

Theorem  II. 1  A  linear  information  inequality  is  valid  if  and 
only  if  its  corresponding  group-theoretic  inequality  is  valid. 

This  theorem  establishes  an  intriguing  relation  between  en¬ 
tropy  and  group  in  the  form  of  inequalities.  A  trivial  impli¬ 
cation  of  the  theorem  is  that  if  we  can  also  prove  an  informa¬ 
tion  inequality,  then  we  also  prove  the  corresponding  group- 
theoretic  inequality,  and  vice  versa. 

III.  Group-theoretic  Approach 

Suppose  we  want  to  prove  a  linear  information  inequality.  We 
can  prove  it  in  two  steps. 

step  1:  Transform  the  information  inequality  to  its  corre¬ 
sponding  group-theoretic  inequality. 

step  2:  Prove  that  the  group-theoretic  inequality  is  true  for 
all  groups. 


Example  III.l  Suppose  we  want  to  prove  H(X\)  +  H(X2)  — 
H(X  1,2)  >  0.  It  suffices  to  show  that  log  j^j  -I-  log  — 
log  j^P~j  >  0)  or  equivalently,  |Gi | |G2 1  <  |Gj|GinG2|  is  sat¬ 
isfied  by  all  groups  and  their  subgroups.  But  it  is  a  trivial 
result  in  group  theory  that  jGi  | IG2 1  <  |G||Gx P1G2 1-  Hence,  the 
result  follows. 

All  commonly  known  information  inequalities  can  be 
proved  by  using  this  group-theoretic  approach.  Moreover,  all 
the  tools  developed  in  group  theory  can  be  used  to  prove  in¬ 
formation  inequalities.  This  approach  enlarges  our  set  of  tools 
for  proving  information  inequalities. 

IV.  Information-theoretic  approach 

As  in  the  previous  section,  we  can  use  an  information-theoretic 
approach  for  proving  group-theoretic  inequalities.  The  proce¬ 
dure  is  similar,  and  we  omit  the  details  here.  But  an  interest¬ 
ing  result  obtained  by  using  this  approach  deserves  mention¬ 
ing.  A  new  information  inequality  has  recently  been  proved 
by  Zhang  and  Yeung  in  [2].  This  information  inequality  is 
highly  non-trivial  that  it  cannot  be  deduced  from  the  com¬ 
monly  known  information  inequalities.  This  inequality,  in 
terms  of  joint  entropies,  is  as  follows: 

H(X\)  4-  H(X2)  +  2H(A1i2)  6 tf(X3,4)  +  4 tf(*ll3) 

+AH(X3)  +  AH(X4)  <  +AH(Xlti)  +  AH(X2,3) 

+5  H(X1|3,4)  +  5H(A2,3,4)  +AH(X2A) 

The  corresponding  group  theoretic  inequality, 

|G3nG4|6|G1nG3|4  |Gi||G2||G3|4 

|GinG4|4|G2nG3|4  <  |G4|4|GinG2|2 
|G2nG4|4  |G,nG3nG4|6|G2nG3nG4|B 

appears  to  be  new  in  group  theory,  but  its  meaning  is  yet  to 
be  understood. 
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Abstract  —  Information  processing  is  performed 
when  a  system  preserves  aspects  of  the  information 
encoded  in  the  input  and  removes  other  aspects.  We 
describe  an  approach  to  quantify  such  information 
processing  based  on  applying  controlled  changes  to 
the  input  and  observing  the  corresponding  outputs. 
Information-theoretic  distance  measures— those  that 
reflect  the  data  processing  theorem— are  calculated  on 
the  input  and  output  separately  and  compared.  Prop¬ 
erties  of  the  resulting  information  transfer  ratio  are 
used  to  quantify  the  system’s  fundamental  informa¬ 
tion  processing  properties. 

I.  Introduction 

In  general,  processing  is  performed  when  a  system  enhances 
certain  aspects  of  its  input  as  it  supresses  others.  While  some 
systems  only  re-represent  the  input  signal  without  loss,  such 
as  an  ideal  amplifier  or  the  Fourier  transform,  others  do  have 
a  loss  and  act  as  “information  filters.”  To  develop  a  mea¬ 
sure  that  would  characterize  a  system’s  information  process¬ 
ing  capability,  we  need  to  compare  input(s)  and  output(s) 
somehow.  In  linear  systems,  one  uses  the  transfer  function 
or  cross-correlation.  However,  in  quantifying  the  processing 
of  more  complex  systems,  non-linearities  and  non-Gaussian 
effects  cause  classical  methods  to  fail  to  capture  all  a  sys¬ 
tem  does.  Furthermore,  in  the  case  of  the  mixed  input  and 
output  (e.g.  continuous  input,  discrete  output),  it  is  diffi¬ 
cult  to  find  joint  distribution  of  the  input  and  output.  We 
induce  controlled  changes  of  the  information  represented  by 
a  system’s  input  and  compare  distances  between  inputs  and 
between  outputs  using  the  Kullback-Leibler  distance.  By  con¬ 
sidering  distance  changes  thus  induced,  we  essentially  specify 
what  information  is  conveyed  and  processed.  Finally,  by  mea¬ 
suring  the  difference  between  two  inputs  before  and  after  the 
change  and  comparing  this  difference  to  the  corresponding 
output  difference,  we  quantify  how  a  system  processes  rele¬ 
vant  information. 


II.  Quantifying  Information  Processing 

We  represent  information  by  a  collection  of  parameters  co¬ 
alesced  into  the  vector  0.  Let  X  represent  a  system’s  input 
signal  and  Y  its  output.  According  to  the  data  processing 
theorem  [2],  if  0  H  X  Y  (0,  X,  and  Y  form  Markov 
chain),  then  7(0;  X)  >  1(6;  Y).  Let  X(0o),  X(0i)  represent 
input  signals  having  different  information  content  with  Y  (do), 
Y  (6 1)  representing  the  corresponding  outputs.  Many  distance 
measures,  which  we  generically  write  as  d(-,  -),  also  satisfy  the 
data  processing  theorem  in  the  sense  that 


7x,y(0o,  6\) 


d(Y(60),Y(61)) 
d(X(0„),X(0i))  - 


"Work  supported  by  NSF  Grant  CCR-9628236. 


All  Ali-Silvey  distances  [1],  satisfy  the  data  processing  the¬ 
orem  by  construction.  We  use  one  particular  Ali-Silvey 
distance — the  Kullback-Leibler  (KL)  distance — extensively 
because  of  its  convenience  and  importance.  We  explore  the 
quantity  qx.Y,  the  information  transfer  ratio ,  defined  as  the 
ratio  of  the  distance  between  the  two  output  distributions  and 
the  distance  between  the  corresponding  input  distributions. 
This  ratio  is  always  between  zero  and  one:  zero  means  none  of 
the  information  change  6o  — >  0i  is  represented  by  the  output 
and  one  means  perfect  reproduction  of  the  input  information 
change. 


III.  A  System  Theory  of  Information 
Processing 


If  two  systems  are  in  cascade,  the  overall  information 
transfer  ratio  is  the  product  of  the  component  ratios:  if 
ShXhY  i-t  Z  form  a  Markov  chain,  7x,z  =  7x,y  •  7y,z 
regardless  of  the  distance  measure  used. 

The  special  case  wherein  the  information  parameter  is  per¬ 
turbed  (6 1  =  0o  +  <50)  yields  interesting  result.  When  the 
distance  measure  is  in  the  Ali-Silvey  class,  we  can  explicitly 
write  the  information  transfer  ratio,  under  very  general  as¬ 


sumptions,  as  7x,y(0Oi  0o  +  <50)  = 


<50'Fy(0o)<50 

<50'Fx(0o)<50’ 


where  F  is 


the  Fisher  information  matrix.  We  refer  to  this  result  as  the 
local  invariance  property:  the  information  transfer  ratio  for 
perturbational  changes  is  invariant  to  the  choice  of  distance 
measure. 

Notice  that  two  previous  results  hold  for  any  Ali-Silvey 
distance  used  in  the  information  transfer  ratio.  However,  the 
KL  distance  is  especially  convenient  since  it  is  related  to  both 
detection  (Stein’s  lemma)  and  estimation  theory  (Fisher  infor¬ 
mation  matrix).  The  following  results  are  derived  using  the 
KL  distance. 

When  the  input  consists  of  several  statistically  independent 
components,  the  overall  information  transfer  ratio  is  related 
to  individual  transfer  ratios  by  an  expression  identical  to  the 


parallel  resistor  formula:  - — — v  =  Y\.  - — — — r. 

7x,y(0o,0i)  *  7Xi,Y(0o,0i) 

Finally,  consider  the  system  with  one  input  and  N  outputs 
that  are  conditionally  independent  given  the  input  (N  parallel 
systems  is  one  example).  We  calculated  how  the  information 
transfer  ratio  changes  as  more  outputs  are  added  for  two  spe¬ 
cial  cases.  In  both  cases  as  N  -t  oo,  7x,y(0o,  0i)  — t  1,  and  the 
asymptotic  differential  increase  in  7  is  proportional  to  1/N2 
We  believe  this  result  applies  more  generally. 
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Abstract  —  The  problem  of  minimizing  sum  trans¬ 
mit  power  for  a  fading  multiple-access  channel  is  con¬ 
sidered.  While  for  non-fading  channels  with  equal 
user  attenuations  wideband  multi-access  with  cooper¬ 
ative  (joint)  decoding  and  orthogonal  access  are  both 
optimal,  wideband  access  is  superior  in  case  of  un¬ 
equal  attenuations  or  fading  channels.  Losses  due  to 
fading  are  negligible  for  wideband  access  as  a  result 
of  low  rate  coding,  and  near  AWGN  perfomance  is 
achievable.  On  the  other  hand,  orthogonal  access  suf¬ 
fers  from  losses  of  high  rate  codes  under  fading  condi¬ 
tions.  Wideband  access  with  non-cooperative  decod¬ 
ing  is  clearly  suboptimal  in  any  case  and  only  useful 
for  low  transmission  rates. 


I.  Introduction 

A  multiple-access  channel  is  considered  where  K  users  are 
accessing  the  channel  at  information  rates  Rk  —  R,k  E  K.  — 
{1, . . .  ,K}.  Each  transmission  signal  Xk  is  attenuated  in 
power  by  a  constant  1/fik  due  to  path  losses  and  randomly 
attenuated  by  Hk  due  to  multipath  propagation  assumed  to 
result  in  flat  fading.  The  fading  is  perfectly  known  at  the  base 
Station  for  all  signals  individually  but  unknown  at  the  trans¬ 
mitters.  The  transmitters  merely  have  access  to  the  average 
attenuations  l//z*,  via  a  low-rate  feedback  link,  which  is  used 
to  control  the  powers. 

The  problem  of  minimizing  sum  transmit  power  is  discussed 
for  wideband  accessing  of  all  users  with  optimal  cooperative 
(joint)  decoding  (WB-CD)  and  independent,  non-cooperative 
decoding  (WB-NCD),  respectively,  as  well  as  orthogonal  ac¬ 
cessing  techniques  (OA). 


II.  Results 

For  WB-CD,  the  set  VCD  of  required  powers  for  equal  rate 
transmission  is  given  implicitely  by  [1]  [2]  1 

VCD  =  jp  €  R+  : 

\S\R<  E  jc  Hk  Pk,v2^  |  ,VS  C  /cj  (1) 


The  problem  of  finding  the  minimum  sum  transmit  power 
Pt  to  support  reliable  transmission  at  rates  R  may  thus  be 
formulated  as 


PT  -  min  V'  y,k  Pk-  (2) 

The  region  VCD  is  proved  to  be  convex  and  thus,  the  min¬ 
imum  can  be  found  using  Kuhn-Tucker  multipliers.  However, 

lC{P,cr2)  -  §log2  (l  +  £)  and  E\(x)  =  /“  ~  dt. 


Pct>  does  not  form  a  contra-polymatroid,  and  the  optimum 
point  P*  minimizing  sum  transmit  power  need  not  be  a  ver¬ 
tex.  The  vertices  of  VCD  posses  the  outstanding  property  of 
being  achievable  by  low  complexity  stripping  [3].  By  focusing 
on  the  best  vertex  a  practically  attractive  solution  is  obtained 
which  usually  is  close  or  equal  to  the  optimum. 

With  WB-NCD,  all  users  are  decoded  individually  and  in 
parallel  considering  all  other  users  as  noise.  The  optimum 
receive  power  P*  per  user  is  given  implicitely  by  1 


R  = 


eP. 


2  In  2  (K  -  2) 


Uf 


2e1 


dt. 


(3) 


For  OA,  the  optimum  receive  power  P*  is  given  by 


R  = 


2K  In  2 


e«p-  Ei 


(4) 


Both  non-cooperative  decoding  and  orthogonal  access  elim¬ 
inate  inter-user  trade-offs  and  result  in  a  minimum  sum  trans¬ 
mit  power  given  by 


PtA  =  P*5>*,  (5) 

ke>c 

which  is  stricly  greater  than  the  best  achievable  with  WB- 
CD  even  for  equal  attenuations.  WB-NCD  suffers  from  sub- 
optimal  decoding  and  OA  from  suboptimal  accessing  as  well 
as  losses  of  high  rate  codes  under  fading  conditions. 

By  using  adaptive  resource  sharing,  the  minimum  sum 
transmit  power  for  OA  could  be  reduced  at  the  cost  of  a  more 
complex  encoder/decoder  pair  supporting  variable  rates. 

If  the  users  are  located  uniformly  within  a  cell  and  the 
attenuations  grow  with  the  distance  rk  to  the  receiver  as 
Hk  =  rpk,  expressions  for  the  long-term  average  transmission 
power  per  user  are  found  for  WB  accessing  with  stripping, 
WB-NCD,  and  OA. 
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Abstract  —  An  exact  error  probability  analysis 
clearly  demonstrates  that  adaptive  antenna  arrays  are 
unable  to  fully  exploit  the  implicit  diversity  effect  of 
Rayleigh  fading  channels.  Instead,  a  class  of  array 
receivers  that  yields  close-to-optimal  performance  is 
proposed. 


I.  System  Description 

Binary  signaling  is  treated,  and  the  channel  comprises  D 
diversity  links,  represented  by  means  of  D  receiver  antenna 
elements.  In  each  link  the  transmitted  waveform  is  perturbed 
by  two  time-varying  random  processes:  one  is  the  multiplica¬ 
tive  fading,  and  the  other  is  the  additive  noise.  The  mod¬ 
ulator  waveforms,  the  noise,  and  the  fading  are  assumed  to 
be  statistically  independent.  Further,  suppose  that  the  noise 
processes  in  the  different  diversity  links  are  equally  strong, 
independent,  and  white  stationary  Gaussian.  The  statistical 
properties  of  the  fading,  which  is  assumed  to  be  frequency-flat, 
is  described  in  detail  in  [1];  the  correlation  function  depends 
on  the  Doppler  frequency  shift,  the  direction  of  arrival  (DOA), 
the  angle  spread  (AS),  and  the  antenna  geometry. 

II.  Optimum  and  Suboptimum  Reception 

Optimum  one-shot  detection  of  binary  signals  on  Rayleigh 
fading  channels  requires  the  continuous-time  received  signal  to 
pass  through  a  time-varying  filter  whose  impulse  response  in 
general  cannot  be  found  on  closed  form  [2].  In  order  to  avoid 
such  complex  operations  on  the  received  signal,  the  subopti¬ 
mum  approach  suggested  in  [3]  will  be  followed  here.  First, 
the  received  process  is  projected  onto  a  finite  dimensional  basis 
to  obtain  a  finite  set  of  N  observation  variables.  Karhunen- 
Loeve  expansion  (KLE)  is  known  to  be  optimal,  since  it  leads 
to  uncorrelated  observables.  The  basis  is  thus  found  as  the  so¬ 
lution  to  a  vector-valued  homogeneous  Fredholm  equation  of 
the  second  kind.  Here,  time-orthogonal  modulator  waveforms 
are  employed  to  make  the  kernel  of  the  Fredholm  equation  in¬ 
dependent  of  the  hypotheses.  Secondly,  given  this  finite  set  of 
observables,  the  optimum  one-shot  detector  performs  a  binary 
likelihood  ratio  test.  However,  the  KLE  is  far  too  complex 
for  practical  implementation;  recall  that  the  no  closed-form 
expression  for  the  basis  exists,  and  note  that  the  detector 
has  to  (numerically)  resolve  for  the  basis  whenever  the  ker¬ 
nel  changes,  i.e.,  whenever  the  Doppler  shift,  the  DOA,  or  the 
AS  changes.  For  a  single-antenna  system,  a  simple  set  of  time 
orthonormal  basefunctions  (ON  set)  has  proved  to  give  per¬ 
formance  comparable  to  that  of  the  KLE  [3].  Our  proposal  is 
then  to  employ  the  very  same  ON  basis  in  each  antenna. 

1This  work  was  supported  by  Grant  PCC-9706-01. 


Further,  the  concept  of  adaptive  antenna  arrays  suggests 
a  weighted  sum  of  the  antenna  signals  to  be  formed.  Much 
research  has  been  devoted  to  derive  antenna  weights,  but  the 
underlying  models  are  mostly  free  from  fading — an  ideal  as¬ 
sumption  hardly  met  in  real  systems.  Two  weighting  prin¬ 
ciples  are  treated:  least  mean  square  (LMS)  and  maximum 
likelihood  (ML) .  The  LMS  algorithm  operates  by  aligning  the 
phases  of  the  antenna  signals,  erroneously  assuming  zero  AS, 
while  the  ML  weights  minimize  the  error  probability.  Once  a 
continuous-time  sum  has  been  formed,  the  KLE  still  consti¬ 
tutes  an  optimum  projection. 

III.  Calculation  Results  and  Conclusions 

Merely  for  brevity,  both  the  first  and  second  order  channel 
statistics  are  assumed  to  be  perfectly  estimated.  An  exact 
expression  for  the  probability  of  error  has  been  calculated  by 
means  of  the  method  given  in  [4] .  Figure  1  shows  that  adaptive 
antenna  arrays  are  suboptimal  regarding  error  performance  on 
Rayleigh  fading  channels.  The  error  rates  were  calculated  for 
a  2-element  array  with  antenna  separation=0.5  wavelength, 
DOA=45°,  AS=90°,  and  Doppler  frequency  shift=0.1  symbol 
rate.  The  antenna  patterns,  DOA=60°  and  AS=180°,  reveal 
a  fundamental  discrepancy  between  LMS  and  ML  weights. 


Fig.  1:  Error  probabilities  and  antenna  patterns. 
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Abstract  —  We  address  the  problem  of  designing 
jointly  optimum  precoder  and  equalizer  for  a  MIMO 
spatial  multiplexing  system  using  the  MMSE  crite¬ 
rion.  Next,  we  compare  the  optimum  power  alloca¬ 
tion  policy  and  the  achievable  rate  region  for  such  a 
system  to  the  well-known  rate  maximizing  precoder 
and  decoder  design. 

I.  Introduction 

It  is  well-known  that  the  optimum  precoder  and  decoder  that 
maximizes  the  information  rate  decouples  the  MIMO  channel 
into  parallel  sub-channels  and  allocates  bits  and  power  on  the 
sub-channels  according  to  the  well-known  water-pouring  strat¬ 
egy.  This  requires  a  variable  coding  and  modulation  scheme 
across  sub-channels  which  is  difficult  to  implement  in  prac¬ 
tice  for  a  changing  MIMO  channel.  For  fixed  modulation  and 
coded  systems,  the  optimum  design  should  equalize  the  MIMO 
channel.  The  MMSE  design  [1][2][3]  optimally  trades  off  noise 
enhancement  and  channel  equalization  at  both  the  transmitter 
and  receiver. 


II.  System  Model 

Spatial  multiplexing  involves  transmitting  (and  receiving)  in¬ 
dependent  data  streams  on  separate  antennas,  through  a 
MIMO  wireless  channel  to  achieve  unparalled  data-rates. 
Consider  the  following  MIMO  system  equation: 

X  =  GHFS  +  GN  (1) 

where  H  is  an  m  x  n  MIMO  channel  whose  (t,  j)-th  entry  de¬ 
notes  the  channel  gain  from  the  j-th  transmit  antenna  to  i-th 
receive  antenna;  X  is  the  6x1  received  vector  and  S  is  the 
6x1  transmitted  vector,  where  6  =  rank(H)  <  min(m,n)  is 
the  number  of  independent  data  streams  that  is  to  be  trans¬ 
mitted;  N  is  the  m  x  1  noise  vector;  finally,  G  is  the  6  x  to 
decoder  matrix  and  F  is  the  nx6  precoder  matrix.  Assume 

E(SS*)  =  P,  E{NN')=Rnn ;  E(SN')=0.  (2) 

where  the  superscript  *  denotes  the  conjugate  transpose.  De¬ 
fine  the  eigen-value  decomposition  (EVD)  : 

H*R-n\H  =  VAV*  =  (V  Vn-b)  (  q  6  )  (V  V„_6 )* 

V  ’  '  0) 
where  V  is  an  n  x  6  orthogonal  matrix  which  is  the  projec¬ 
tor  onto  the  range  space  of  H*R^lNH  and  A  is  a  diagonal 
matrix  containing  the  6  non-zero  eigen  values,  arranged  in  a 
decreasing  order  from  top-left  to  bottom-right. 
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The  optimum  precoder  and  equalizer  that  minimizes  the 
total  output  symbol  estimation  errors  using  the  MMSE  crite¬ 
rion,  was  shown  to  diagonalize  the  MIMO  channel  [1] [3].  The 
optimum  transmitter  power  allocation  policy  across  the  sub¬ 
channels  is  given  by  [1][2][3]: 

*>/  =  0.-1/,A-*/>-A-1)+  (4) 

where  is  a  diagonal  matrix  of  transmitter  powers  across 
sub-channels  and  /i  is  computed  so  that  the  total  power  con¬ 
straint  tr{$2f)  =  Po  is  satisfied.  This  is  compared  to  the  well- 
known  water-pouring  policy  given  as: 

*5  =  (M-I'a/_A-,)+  (5) 

The  maximum  data  rate  for  the  i-th  sub-channel  (for  both  the 
designs)  is  given  by: 

Ci  =  log  ll  +  A.-^l  (6) 

where  is  obtained  from  (4)  or  (5). 


III.  Results 

The  MMSE  design,  like  the  rate-maximizing  design,  allocates 
non-zero  power  on  sub-channels  with  highest  SNRs.  How¬ 
ever,  among  the  above  chosen  sub-channels,  power  is  allocated 
inversely  proportional  to  sub-channel  SNRs,  unlike  the  rate- 
maximizing  design.  This  subtle,  yet  important  difference  leads 
to  a  loss  in  data  rate  which  we  now  quantify.  The  capacity  hit 
suffered  by  the  MMSE  design  (under  high  SNRs)  is  given  by: 
SC  =  log  where  g  =  6^-- •  When  H  is  an  or¬ 


thogonal  matrix  (A  — t  /),  the  capacity  hit  is  zero  i.e. ,  SC  =  0. 
For  an  iid  channel  matrix,  the  capacity  hit  suffered  by  MMSE 
design  is  minimal,  while  a  more  trivial  channel  inversion  strat¬ 
egy  at  the  transmitter  suffers  a  great  hit  in  capacity  due  to 
noise  enhancement. 

The  MMSE  policy  ensures  similar  sub-channel  SNRs  (when 
compared  to  the  water-pouring  policy)  and  hence  favors  iden¬ 
tical  but  lower  data  rate  transmission  across  sub-channels. 
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Abstract  —  This  paper  addresses  a  problem  in  error 
control  coding  when  a  user  has  access  to  either  one 
of  two  identical  channels  or  may  have  access  to  both 
channels  simultaneously.  In  particular,  puncturing  se¬ 
quences  matched  to  this  type  of  scenario  are  identified 
for  select  convolutional  codes. 

I.  INTRODUCTION 

Communications  networks  are  usually  designed  to  handle  a 
specified  peak  capacity  that  may  occur  during  busy  hours  or 
in  highly  populated  regions.  It  is,  therefore,  not  uncommon 
that  such  networks  have  excess  idle  capacity  a  good  percent¬ 
age  of  the  time.  The  network  can  use  this  excess  capacity  to 
enhance  the  performance  of  other  users  accessing  the  network 
during  off-peak  times.  This  scenario  was  particularly  moti¬ 
vated  by  global  satellite  systems  [1].  In  such  systems  most 
latitude  bands  provide  for  double  coverage;  that  is,  one  user 
can  see  two  satellites  most  of  the  time.  When  satellite  capac¬ 
ity  is  under-utilized,  each  user  can  access  two  radio  channels 
between  a  user  and  two  distinct  satellites.  The  simplest  (and 
common)  error  control  coding  strategy  is  to  use  repetition  cod¬ 
ing  when  two  channels  are  available:  transmit  the  same  code 
twice,  once  over  each  channel.  The  receiver  can  then  use  any 
soft  combining  method  it  desires  when  it  receives  both  chan¬ 
nels.  In  this  case,  a  practical  coding  gain  provides  an  extra 
2.5  dB.  The  next  section  describes  a  strategy  that  takes  better 
advantage  of  the  availability  of  the  two  channels. 

II.  DUAL  PUNCTURED  CODES 

Consider  a  digital  communications  system  having  access  to 
two  channels:  channel- 1  and  channel-2.  Each  channel  is 
bandwidth  restricted  such  that  only  a  rat e-fc/n  code  can 
be  used.  The  intended  receiver  can  receive  the  information 
from  channel- 1  only,  or  channel-2  only,  or  both  channel-1  and 
channel-2  simultaneously.  A  good  error  control  coding  strat¬ 
egy  is  the  following. 

•  Use  or  construct  the  optimal  (or  best  known)  convolu¬ 
tional  code  C  with  rate  fc/2n. 

•  Find  a  dual-puncturing  scheme  that  divides  C  into  the 
two  codes  C\  and  Ci ,  each  with  rat  e-fc/n.  Dual  refers 
to  a  puncturing  sequence  and  its  2-s  compliment.  The 
dual-puncturing  scheme  is  such  that  the  free  distance 
of  C\  and  C2  are  the  same,  and  are  as  good  as  the  best 
known  punctured  codes. 

•  Transmit  Ci  over  channel-1. 

•  Transmit  C2  over  channel-2  when  available. 

•  If  the  receiver  receives  channel- 1  or  channel-2  only,  it 
can  decode  a  rat e-fc/n  punctured  code  using  the  rate- 
k/2n  code  decoder. 


•  If  the  receiver  receives  both  channel- 1  and  channel-2, 
it  decodes  a  rate-fc/2n  code  with  the  same  rate-fc/2n 
decoder  as  above. 


III.  SEARCH  RESULTS 

(A)  Rate- 1/3,  K  —  3  Convolutional  Code 

The  first  example  is  demonstrated  starting  with  a  rate-1/3, 
constraint  length  K  =  3  (4  states)  convolutional  code.  The 
octal  form  of  the  code  generator  is  G  =  (5,7,7).  Using  an 
exhaustive  search,  the  best  punctured  rate-1,/3  code  to  rate- 
2/3  code  is  derived  from  a  puncturing  sequence  which  is  a 
periodic  repetition  of  the  following  puncturing  vector  of  length 
6: 

P=(l  0  0  0  1  1) 

to  obtain  Ci ,  with  a  dual  puncturing  vector  given  by 

P  =  (0  11  1  0  0) 


to  obtain  C2-  Moreover,  dfree(Ci)  =  dfree(C2)  =  4. 

A  commonly  used  puncturing  sequence  is  a  periodic  se¬ 
quence 

P  =  (l  0  1  0  1  0...) 

with  a  resulting  dual  puncturing  vector 

P=(0  1  0  1  0  1...) 


However,  in  this  case,  dfree(Ci)  —  4,  but  dfree(C2)  =  3. 

In  comparison,  the  best  known  rate-2/3,  K  =  3  code  has 
^free  — 

(B)  Rate-1/3,  K  =  4,  G  =  (13, 15, 17) 

The  best  puncturing  vector  of  length  12  and  its  dual  are 
given  by 


P  =  (100101  101010) 

P  =  (0  1  101001010  1) 

resulting  in  rate-2/3  codes  such  that  dfree(Ci)  =  dfree ( C2 )  = 
5.  The  best  known  rate-2/3,  K  =  4  code  has  dfree  =  7. 

(C)  Rate-1/3,  K  =  5,G=  (25, 33, 37) 

The  best  puncturing  vector  of  length  12  is  given  by 

P  =  (1  1110000110  0) 

resulting  in  rate-2/3  codes  with  dfree(Ci)  =  dfree(C2)  =  6. 
The  best  known  rate-2/3,  K  =  5  code  has  dfree  =  8. 

References 

[1]  A.  A.  Hassan,  B.  D.  Molnar,  and  Y-P  E.  Wang,  ’’Coding  and 
modulation  for  mobile  satellite  systems,”  Proceedings  of  the  Ve¬ 
hicular  Technology  Conference,  1997,  pp.  1997-2001. 


0-7803-5857-0/00/S1  0.00  ©2000  IEEE. 


497 


ISIT  2000,  Sorrento,  Italy,  June  25-30,2000 


On  the  Linear  Structure  of  Self-Similar  Processes1 

Carl  J.  Nuzman  H.  Vincent  Poor 

Dept,  of  Electrical  Engineering 
Princeton  University 
Princeton,  NJ  08540  USA 
e-mail:  {c jnuzman,poor}Qee . princeton . edu 


Abstract  —  Self-similar  processes  have  a  rich  linear 
structure,  based  on  scale  invariance,  which  is  anal¬ 
ogous  to  the  shift-invariant  structure  of  stationary 
processes.  The  analogy  is  made  explicit  via  Lam- 
perti’s  transformation.  This  transformation  is  used 
here  to  characterize  the  reproducing  kernel  Hilbert 
space  (RKHS)  associated  with  self-similar  processes 
and  hence  to  solve  problems  of  prediction,  whitening, 
and  Gaussian  signal  detection.  Some  specific  results 
for  the  fractional  Brownian  motion  illustrate  the  gen¬ 
eral  concepts. 

I.  Self-Similar  Processes 
An  H- self- similar  (or  H-ss)  stochastic  process  is  one  whose 
distributions  are  essentially  invariant  to  scaling  of  the  time 
axis.  More  precisely,  scaling  by  a  factor  a  >  0  has  the  same 
effect  as  multiplying  the  process  by  a  factor  aH : 

{Y (at)}  =  {aHY{t)},  a  >  0, 

where  the  notation  =  indicates  that  the  two  processes  have 
the  same  probability  law,  and  where  H  is  referred  to  as  the 
self-similarity  parameter  of  the  process.  In  recent  decades  self¬ 
similar  processes  have  found  application  in  diverse  fields  in¬ 
cluding  hydrology,  medicine,  finance,  physics,  and  electrical 
engineering. 

As  noted  in  [1],  Lamperti’s  transformation  Lh  given  by 
(L„Y)(t)  =  e-HtY(et) 

invertibly  maps  an  H-ss  process  Y  on  R+  to  a  stationary  pro¬ 
cess  LhY  on  1R.  The  process  LhY  is  the  stationary  generator 
of  Y .  Recent  applications  of  Lamperti’s  transformation  can 
be  found  in  [2],  [3],  [4],  and  [5]. 

II.  RKHS  Structure  of  Self-Similar  Processes 
The  reproducing  kernel  Hilbert  space  (RKHS)  formalism 
can  be  used  to  describe  the  linear  space  of  a  random  pro¬ 
cess,  and  to  describe  the  solutions  to  linear  problems  such 
as  Gaussian  signal  detection,  prediction,  and  whitening  [6], 

[7].  Given  a  random  process  Y(t)  on  an  index  set  I,  there  is 
an  isomorphism  J  which  maps  random  variables  in  the  lin¬ 
ear  space  L2  (Y,  I)  of  Y  to  functions  in  a  specially-structured 
Hilbert  space  S(Y ,  I)  (an  RKHS).  The  solutions  to  many  prob¬ 
lems  of  interest  are  known  once  we  have  answered  the  fol¬ 
lowing  questions:  Which  functions  belong  to  S(Y,I )?  For 
f,g  6  S(Y,  I),  how  can  the  inner  product  (/,  g)  be  comput¬ 
ed?  For  /  6  S(Y,I),  how  can  the  random  variable  J~l(f )  be 
expressed? 

1  Research  supported  in  part  by  the  Office  of  Naval  Research 
under  Grant  N00014-00- 1-0141,  in  part  by  the  National  Science 
Foundation  under  Grant  CCR-9979361,  and  in  part  by  the  U.S. 
Department  of  Defense  NDSEG  Fellowship  Program. 


The  result  below  demonstrates  that  if  these  questions  can 
be  answered  for  the  stationary  generator  of  an  H-ss  process, 
then  they  can  easily  be  answered  for  the  H-ss  process  itself. 

Theorem  1  Suppose  that  Y  is  a  H-ss  process  on  I  C  1R+, 
and  that  X  =  LhY  is  its  stationary  generator.  Denote  by 
Jy  :  L2(Y,I)  -1  S{Y,I)  and  Jx  :  L2(X,\nI)  -4  S{X,\nI) 
the  RKHS  isomorphisms  associated  with  each  process.  Then 
L2(Y)  =  L2(X)  and  for  each  g  €  S(Y,I),  Jx(Ty1(s))  =  Lh9- 

The  RKHS’s  associated  with  stationary  processes  on  semi¬ 
infinite  index  sets  can  be  characterized  using  spectral  factor¬ 
ization  and  linear  time-invariant  systems.  Applying  Theo¬ 
rem  1,  we  can  describe  the  RKHS’s  associated  with  H-ss  pro¬ 
cesses  on  a  variety  of  index  sets,  using  linear  self-similar  sys¬ 
tems. 

III.  Application  to  Fractional  Brownian 
Motion 

Specializing  the  general  results  of  the  previous  section,  we 
characterize  the  RKHS  associated  with  fractional  Brownian 
motion  (fBm)  on  various  index  sets,  extending  results  in  [8] 
and  [9].  We  also  give  conditions  for  non-singular  discrimina¬ 
tion  of  the  usual  fBm  from  the  Barnes-Allan  fBm. 
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The  redundancy-rate  problem  of  universal  fixed-to- 
variable  length  coding  for  a  class  of  sources  consists  in  de¬ 
termining  by  how  much  the  actual  code  length  exceeds  the 
optimal  (ideal)  code  length.  In  a  minimax  scenario  one  finds 
the  additional  “price”  on  top  of  entropy  incurred  (at  least)  by 
any  code  in  order  to  be  able  to  cope  with  all  sources.  While 
Shields  [5]  proved  that  there  is  no  function  o(n)  which  is  a 
rate  bound  on  the  redundancy  for  the  class  of  all  ergodic  pro¬ 
cesses,  it  has  been  known  for  some  time  (cf.  [3])  that,  for 
certain  parametric  families  of  sources  (e.g.,  memoryless  and 
Markov  sources),  the  redundancy  can  be  as  small  as  ©(logn) 
where  n  is  the  block  length.  There  was  no  interesting  bound 
for  a  class  of  sources  that  lies  between  0(logn)  and  general 
o(n)  until  recently,  when  Csiszar  and  Shields  [1]  designed  a 
renewal  class  of  sources  that  yields  a  0(y4T)  bound.  In  this 
paper,  we  provide  a  precise  asymptotic  expansion  of  the  re¬ 
dundancy  for  renewal  sources  up  to  the  constant  term. 

Given  a  probabilistic  source  model,  we  let  P(xi)  be  the 
probability  of  the  message  x"  g  An.  For  a  given  code  Cn, 
we  denote  by  L(Cn,Xi)  the  code  length  for  x".  The  point- 
wise  redundancy  Rn(Cn,P)  is  defined  as  Rn(Cn,P,x ")  = 
L(C„,Xi)  +lgP(xi).  The  (asymptotic)  strong  redundancy- 
rate  problem  consists  in  determining  for  a  class  S  of  source 
models  the  rate  of  growth  of  the  minimax  quantities 

Rn(S)  =  min  sup{max{i?n(C„,  P\  x?)}} 
c„  Pes  x” 

where  supremum  is  taken  over  all  distributions  P. 

A  substantial  literature  is  available  on  the  redundancy 
problem.  The  following  results  are  known: 

•  If  M.  is  i.i.d.  or  the  class  of  Markov  chains,  or  more  generally 
the  process  belongs  to  a  finitely  parameterizable  class  of  di¬ 
mension  K,  then  it  was  established  that  Rn(M)  ~  R'n(M)  ~ 
y  logn  (cf.  Rissanen  [3]). 

•  Csiszar  and  Shields  [1]  have  studied  order  r  Markov  renewal 
sequences  in  which  a  1  is  inserted  every  To,  Ti, . . .  of  0’s,  where 
{T}  is  either  an  i.i.d.  or  Markov  renewal  or  r-order  Markov 
renewal  process.  We_  denote  such  sources  as  TZr-  The  authors 
of  [1]  proved  that  Rn(TZr)  =  R'{nT)  =  0(n(r+1)/(r+2)  for 
r  =  1, 2, . . .  which  specializes  to  Q(y/n)  when  r  —  0. 

•  Shields  [5]  proved  that  there  is  no  function  p(n)  =  o(n)  which 
is  a  weak-rate  bound  for  the  class  of  all  ergodic  processes. 

•  Louchard  and  Szpankowski  [2],  Savari  [4],  and  Wyner  [7] 
proved  that  the  Lempel-Ziv  codes  in  the  class  of  i.i.d.  and 
Markov  processes  have  either  rate  0(n/logn)  (for  LZ’78)  or 
0(n  log  logn/ logn)  (for  LZ’77  code). 

1  This  work  was  supported  in  part  by  NSF  Grants  NCR-9415491 
and  C-CR-9804760,  and  Purdue  Grant  GIFG-9919. 


We  now  present  our  main  result  and  start  with  a  precise  def¬ 
inition  of  the  class  TZo  of  renewal  process  and  its  associated 
sources.  Let  Ti,T2 . . .  be  a  sequence  of  i.i.d.  positive- valued 
random  variables  with  distribution  Q(j)  =  Pr{Ti  =  j}.  An  in¬ 
dependent  random  variable  T0  is  introduced  with  distribution 
Pr{T0  =  i}  =  E[Ti]_1  >iQ(j)  provided  E[Ti]  <  oo.  The 
quantities  {T}/^  are  the  interarrival  times,  while  To  is  the 
initial  waiting  time.  The  process  To,  T0  +  Ti , To  +  Ti  +  T2, . . . 
is  then  called  a  renewal  process  and  it  is  stationary  whenever 
To  has  the  distribution  above.  With  such  a  renewal  process 
there  is  associated  a  binary  renewal  sequence  that  is  a  0, 1- 
sequence  in  which  the  l’s  occur  exactly  at  the  renewal  epochs 
To,  To  +  Ti,  To  +  Ti  +  T2,  etc. 

Shtarkov’s  maximum-likelihood  technique  [6]  implies 


log2  £  SUpP(x")  <R*n(TZ 0)  <  log2  £  supP(x" )  j  +1 


where  supremum  is  taken  over  all  distributions  Q.  We  use  this 
bound  to  prove  our  main  result. 


Theorem  1  Consider  the  class  TZo  of  renewal  sources.  The 
the  minimax  redundancy  R’n  of  the  renewal  process  satisfies 

Rn(R°)  =  log^  ^  I0®2  ^  I0®2  ^g 

for  large  n. 


This  result  is  proved  by  complex-analytic  methods  that 
include  generating  functions,  Mellin  transforms,  singularity 
analysis  and  saddle  point  estimates.  Thus,  this  work  places 
itself  within  the  framework  of  analytic  information  theory. 
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Abstract  —  The  problem  of  tracking  an  exponen¬ 
tially  unstable  scalar  source  process  across  a  noisy 
channel  is  considered.  We  introduce  the  a  new  para¬ 
metric  notion  of  capacity  that  we  call  “any-time  ca¬ 
pacity”  Cat(a).  It  is  a  twist  on  the  familiar  concept 
of  error-exponents  and  is  always  between  the  classi¬ 
cal  Shannon  Capacity  and  the  zero-error  capacity.  A 
separation  theorem  is  given  which  shows  that  Ca^(a) 
characterizes  the  properties  of  a  channel  needed  for 
finite  expected  distortion. 

I.  Introduction 

In  a  sense,  the  justification  of  Shannon  capacity  is  the 
classical  source-channel  separation  theorem  and  its  modern 
refinements[9].  These  tell  us  that  for  a  wide  class  of  sources, 
channels,  and  distortion  measures,  two-part  encodings  suffice 
as  long  as  we  are  willing  to  tolerate  delays. 

Traditional  rate-distortion  theory[l]  has  focused  almost  ex¬ 
clusively  on  stationary  processes.  While  a  broad  class,  it  ex¬ 
cludes  exponentially  unstable  processes  which  are  important 
in  practice,  especially  in  control  applications^].  Recently, 
there  has  been  some  work  showing  how  to  extend  source  cod¬ 
ing  to  such  processes.  ([7],  [3])  But  these  have  implicitly  con¬ 
sidered  only  noiseless  channels.  For  noisy  channels,  the  situ¬ 
ation  was  unclear  since  the  traditional  source-channel  separa¬ 
tion  theorem  need  not  (and  in  fact,  does  not)  apply. 

II.  Why  Classical  Separation  Fails 

Consider  the  simplest  of  all  unstable  processes: 

Xt+i  =  AXt  +  Wt,  t  >  0,  A  >  1  (1) 

where  {At}  is  an  JR- valued  state  process  and  {Wt}  is  a 
bounded  noise  process  s.t.  ||Wt||  <  Assume  Xo  =  0  for 
convenience.  This  process  is  non-stationary  and  has  infinite 
variance  as  t  goes  to  oo.  Our  per-letter  distortion  measure  is 
the  usual  d(X,  A)  =  (A  —  X)2. 

V<5  >  0,  sequential  rate  distortion  theory([8],  [7])  gives  en¬ 
coders  which  can  track  this  process  with  finite  expected  dis¬ 
tortion  using  (log2  A  +  S)  bits  per  sample.  They  quantize 
(At  —  AAt_i)  at  each  time  and  recursively  track  the  source. 

If  we  attempt  to  apply  the  usual  separation  results, 
we  would  pick  an  e  >  0  which  3(N,Sn,Pn)  for  which 
Pc(£n  i'Dn)  <  e  across  a  noisy  channel.  For  Shannon  Ca¬ 
pacity,  while  this  per-bit  probability  of  error  can  be  made  ar¬ 
bitrarily  small,  it  can  not  be  made  exactly  zero.  Eventually,  a 
mistake  will  be  made.  The  effect  will  be  compounded  at  every 
subsequent  time  step  since  it  will  get  repeatedly  multiplied  by 
A  >  1  in  the  source  decoder’s  recursion.  The  expected  per- 
letter  distortion  will  thus  tend  to  infinity  with  probability  one, 
regardless  of  how  small  an  e  we  choose  in  our  channel  code! 

1Work  under  Prof.  Sanjoy  Mitter  and  supported  by  U.S.  Army 
Grant  PAAL03-92-G-0115. 


III.  “Any-time”  Capacity 
Definition  III.l  The  a-any-time  capacity  Caf(a)  of  a  chan¬ 
nel  is  the  maximal  rate  at  which  the  channel  can  be  used  to 
transmit  data  with  a  probability  of  error  that  decays  to  zero 
with  delay  at  least  exponentially  at  a  rate  a. 

Cat(a)  =  sup{R\3(£*,K),VN  >  0,3V%,Pe(SR,v5)  <  K2~aN 

The  above  definition  is  very  close  to  the  definition  of  the 
reliability  function  E(R)  of  a  channel  given  in  [2].  The  crucial 
difference  is  that  while  we  require  the  encoder  to  be  fixed,  in 
the  standard  definition  of  error  exponents  both  the  encoder 
and  decoder  vary  with  delay  N. 

Theorem  III.l  [6]  For  the  AWGN  channel  with  noiseless 
feedback,  Cat{a)  =  C  regardless  of  the  value  for  a. 

Theorem  III.2  [5]  For  the  binary  erasure  channel  with 
noiseless  feedback  and  probability  of  erasure  e: 

Cat{ri  ~  log2(l  +  (2 ”  -  l)e))  =  1  -  i  log2(l  +  (2”  -  l)e) 

if  you  let  r)  range  over  (0,  oo). 

Amazingly,  [6]  shows  that  a-any-time  capacity  is  also  non¬ 
zero  for  these  channels  even  without  any  feedback! 

IV.  Separation  For  Unstable  Processes 

Theorem  IV. 1  The  source  in  (1)  can  be  tracked  with  finite 
MSE  across  a  noisy  channel  iff  there  is  an  e  >  0  for  which 
Cat( 2  log2  A  +  e)  >  log2  A  for  the  channel. 
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Almtmct.  We  discuss  how  to  identify  a  Markov 
information  source  with  transition  matrix  P  by  only 
observing  a  sequence-  of  symbols  generated  by  the 
source. 

I.  Introduction 

Markov  information  sources  play  an  important  role  in  mod¬ 
elling  subjects  to  be  studied  as  a  stochastic  process,  for  ex¬ 
ample,  blockcipher,  speech  recognition,  recognition  of  human 
genes  in  DXA  and  so  on.  Suppose  what  we  can  do  is  only  ob¬ 
serving  a  sequence  of  symbols  generated  by  a  Markov  source. 
To  identify  an  Ar-state  simple  Markov  source  with  transition 
matrix  P  which  takes  symbols  in  S  £  {1,  2,  ■  •  • ,  N).  it  is  natu¬ 
ral  to  estimate  directly  all  the  elements  p,j.  ( i.  j  —  1,  2,  •  •  • ,  N) 
in  P  by  using  N2  histograms  of  possible  strings  of  length  2. 
However,  this  method  requires  too  many  histograms  if  the 
number  of  states,  N  becomes  large.  Since  statistics  of  se¬ 
quences  generated  by  the  Markov  source  are  primarily  gov¬ 
erned  by  eigenvalues  of  P,  one  of  simple  ways  to  identify  the 
source  is  to  estimate  eigenvalues  of  P.  In  this  case,  the  number 
of  eigenvalues  in  question  is  A  —  1.  Hence  we  discuss  how  to 
estimate  a  characteristic  polynomial  of  P  by  using  histograms 
whose  number  is  in  the  order  of  N. 

II.  Algorithm 

The  characteristic  polynomials  of  P  is  expressed  as 


Next,  select  strings  of  length  m  of  which  all  the  first  symbol 
tin  («o  £  S)  is  the  same.  We  denote  elements  of  a  class  of  this 
specific  strings  by  ul"u'm).  We  investigate  2 N  -  2  histograms 

of  random  variables  “.Mtln*"11'’"*),  ( m  =  1,  2.  ■  ■  • ,  2.Y  —  2). 
Note  that  p„0  is  to  be  estimated  when  m  =  1. 

Thus  we  obtain  a  nonsymmetric  Toeplitz  system  of  equa¬ 
tions 

‘  -4(.Y- 1)  A(N-2)  •••  -4(1)  1  T  in  * 

-4(A)  A(N  - 1)  .4(2) 

.  -4(2iY  —  3)  -4(2A  —  4)  •••  -4(-V  —  1)  J  L  «,v-i 

’  -4(A)  ‘ 

-4  (A  +  1) 

=  -  .  ,  (8) 
.  -4(2 A  —  2)  . 

where  .4(m)’s  are  constants  determined  by  means  and  vari¬ 
ances  of  2N  —  2  histograms  of  i-\/i(«("°  m)).  Note  that  (8) 
is  like  the  Wicner-Hopf  equation  [1],  We  remark  here  that, 
unknown  parameters  are  a,  and  eigenvalues  of  P.  A;  ^  1 
(/  =  1,  2,  -  -  • ,  A  —  1)  and  hence  the  number  of  these  is  equal 
to  that  of  different  strings  of  symbols.  This  implies  the  above 
method  is  based  on  the  minimum  number  of  histograms. 


•p(P)  —  (■<'  —  l)(.c‘V  1  +  «l  l,iV  "  +  ■••+  «,V—  I  )•  (1) 

Denote  an  arbitrary  string  of  length  rn  by 

V  =  Un U\  ■  ■  •  14  €  S,  (k-  =  0, 1,  •  •  • .  m  -  1).  (2) 


Next.  let. 


«<P)  =  ’ ' ' '  «,™L1  (r  =  0. 1.  •  •  ■ ,  A”1  -  1)  (3) 

be  the  i  -tli  string  with  elements  u[. ’ 1  6  S,  {k  —  0, 1,  •  •  • .  m— 1). 

Li't.  { A„  },N=i  (A, i  €  S)  be  a  sequence  generated  by  a 
Markov  source.  We  introduce  a  binary  random  variable, 


WD-"'1) 


{1  (A'„A'„+i  *  *  •  A ,, ) ,,, ..  i  —  u 

0  (A„ A'„+i  ■  •  ■  A", #  «' 


=  «tr)) 
#  «,r)) 


j»/r(w‘T,)  =  53y;(u(r) 


First  choose  strings  with  symmetry  such  that. 


"0  ~  "m- 2’ 


r>  /  (f,j  =0.1,-  -.  y  )  «‘r)  #  «‘r)- 


III.  Concluding  Remarks 

Computer  simulations  are  carried  out  for  2-state  Markov 
chains.  Unfortunately  experimental  estimations  of  A(2)  are 
not  in  accordance  with  theoretical  ones  because  estimating 
variances  of  histograms  of  —Ml(uu'°'2))  are  not  successful. 

If  we  do  not.  use  the  variances  of  histograms,  the  algorithm 
is  to  be  modified  as  follows.  Denote  any  strings  of  length  m 
with  «[)  1  =  wJ,,L|  by  ti(,,0'"‘>.  Means  of  2N  —  2  histograms  of 

random  variables  y•M/,(u(“0,”>,)  (»«  =  1,2,  --,2A  —  2)  give 

coefficients  „4(  m)'s  of  another  Toi'plitz  system  of  i-quations. 
Computer  simulations  based  on  this  algorithm  are  also  carried 
out  for  2-stat.e  and  3-state  Markov  chains.  The  identification 
of  2-state  Markov  chains  are  successful.  However  ,  experimen¬ 
tal  estimations  of  -4(4)  are  not  in  accordance  with  theoretical 
ones  on  identification  of  3-state  Markov  chains.  Further  in¬ 
vestigation  is  needed  and  problems  remain  on  the  algorithm 
based  on  the  minimum  number  of  histograms. 
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